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Science in the community 


Randomized controlled trials are providing evidence about interventions in health, education and 
international development, but they are only part of a suite of useful tools. 


randomized controlled experiments to test public policies, on 

issues from health and public safety to agriculture and educa- 
tion. Practitioners are generating hard data and pushing evidence into 
the government sphere. But they are ruffling the feathers of conven- 
tional economists, who have long focused on models and qualitative 
field data. Some worry that the new focus on randomization could 
skew the questions that researchers look at, or could produce black- 
box results that cannot answer crucial questions about why something 
does or does not work. But more evidence is always a good thing, and 
there is plenty of room for academics of all stripes. 

Despite disagreements, it is hard to dispute the value of the basic 
goal: to ensure that governments invest their limited resources in 
programmes that work as advertised, and look for ways to alter or 
eliminate those that do not. But in the messy space occupied by social 
scientists, it is not always easy to determine which programmes 
work, which do not, and why. Enter the randomized controlled trial, 
in which changes are measured in a selection of individuals or groups 
who have been randomly assigned to receive an intervention — or 
not. The medical industry has used such trials to tease out the effects 
of drugs for decades. 

As discussed on page 150, development economists have led the 
way, and are now running hundreds of trials that are designed to 
improve the effectiveness of international aid and, ultimately, the 
much larger pool of domestic spending by governments in developing 
countries. These researchers have already produced valuable insights, 
and governments are scaling up some of the results. Buses in Kenya 
are becoming safer thanks to stickers that urge passengers to speak up 
when they feel unsafe, and residents of the Indian state of Gujarat may 
soon benefit from the implementation of a new pollution-auditing 
system for industrial plants. 


‘ | va past decade has seen an explosion of interest in the use of 


EVIDENCE-BASED INTERVENTION 
Many of the trials focus on human behaviour in very particular 
circumstances, but researchers are targeting larger questions as well. 
Some studies have looked at the impact of community-driven develop- 
ment efforts, with mixed results. Others are considering how trials 
could provide the evidence that governments and aid agencies need 
to improve the delivery of humanitarian relief. And then there are the 
long-standing questions about how much aid can accomplish in lifting 
people out of poverty. In a study published in May, a team of research- 
ers ran randomized trials in six countries looking at whether a package 
of interventions that included cash, food, health care and training 
could give a lasting boost to the poor (A. Banerjee et al. Science http:// 
doi.org/4p7; 2015). The evidence suggests that the answer is yes, for 
at least one year after the intervention has ceased (see Nature 521, 
269; 2015). 

Contrast that with the Millennium Villages Project (MVP), which 


began in 2004 and has pushed a comprehensive aid package into villages 
in ten countries in Africa. MVP researchers have begun evaluating how 
the project has done so far (see page 144). But the research protocol that 
they published last month (see go.nature.com/3eidfr) acknowledges 
that it will be difficult to definitively answer questions about how well 
these villages have fared compared with surrounding villages that did 
not receive the intervention, largely because the project did not use an 
experimental approach from the outset. Information on whether the 

project is effective would have been useful for 


“The goalis policymakers who are facing difficult choices 
to ensure that about where to invest limited resources. 

governments Many poverty-alleviation programmes 
invest in focus on providing money with strings 
programmes attached — only if families keep their chil- 
that work.” dren in school and attend health clinics, for 


instance — but some researchers are now 
advocating unconditional cash transfers. Paul Niehaus, an economist 
at the University of California, San Diego, co-founded the non-profit 
organization GiveDirectly in New York City to do just that. His team 
points out that many forms of development aid are complex and costly 
to administer, and suggests judging their effectiveness against that 
of simply giving money. This would be done in randomized trials, 
in which the intervention group receives development aid and the 
control group gets cash. 

There is much to be learned from randomized trials, but researchers 
must acknowledge that such experiments have limitations, not least 
that they do not necessarily provide answers as to why something 
works or does not. In the case of community-driven development pro- 
grammes, for instance, a randomized controlled trial can provide basic 
statistics about whether letting community councils make their own 
decisions hastens the delivery of basic services, improves the economy 
and advances the social well-being of women. But do these councils 
actually promote social cohesion in a meaningful way? The results so 
far are mixed, and this is where agencies and institutions can benefit 
from the soft, qualitative social-science survey data that researchers 
have been seeking to go beyond. 

Randomized controlled trials will not be able to provide answers 
to all of the world’s enduring questions, and they are not the only 
way to gather solid data. Economists have developed other quasi- 
experimental approaches that do not include randomization but can 
nonetheless provide rigorous statistics with which to judge a pro- 
gramme’s effectiveness. 

In combination, these methods are providing policymakers with 
more information every day. But perhaps more important is what 
happens next. Politicians, government bureaucrats, activists and phil- 
anthropists will all be happy to talk about the value of randomized trials 
when the results support their policies and programmes. They must 
also have the courage to do so when the evidence goes the other way. m 
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August, they have been asked to ponder the future of science. 

Research commissioner Carlos Moedas has announced his pri- 
orities as being “open science” and “open innovation’, and invited his 
team to report back with its ideas on how to achieve that. 

These goals sound laudable enough, but they’re really rather anodyne. 
Sadly, the commission has closed the door ona more ambitious project. 
This time last year, it sought views on Science 2.0. That term infers truly 
radical change — including rapid evolution of the two main pillars that 
underpin science: the research paper and the single-investigator grant. 

Predictably, the first noises to emanate from scientific leaders to the 
Science 2.0 consultation were sighs of scepticism. Science, they purred, 
organizes itself indigenously from the bottom up. Each discipline has 
developed its own processes and the arrival of the 
digital economy — the main spur for a Science 2.0 


A s the staff of the European Commission head for the beaches this 


reboot — was an incremental change that the dis- T H F 


COMMISSION 


HAS SOME ROOM FOR 


MANOEUVRE, 


AND THE 


LUXURY 
OF LONG-TERM 
PLANNING. 


ciplines will accommodate in their own ways, in 
the fullness of time. 

I fully understand why no one wants the 
commission to get carried away and do some- 
thing drastic. I can't help feeling, however, that 
an opportunity has been lost. A revamp of the 
commission's €11-billion (US$12-billion)-a-year 
research programmes to anticipate Science 2.0 
might have nurtured the ability of a new genera- 
tion of Europeans to develop knowledge in dif- 
ferent ways. 

Take the peer-reviewed paper, the main yard- 
stick for success or failure in almost all academic 
research careers. The paper — as its name hand- 
ily implies — has been rendered obsolete by the 
arrival of the online world. The constraints that paper publication places 
on overlapping authorship, evolution of content and links to other work 
and to other people’ data, have all gone. 

A decade ago, many people in scientific publishing envisaged the 
broadening or abandonment of the discrete paper by around now. Yet 
because of the extent to which institutions, career paths, publishers and 
funding agencies rely on this essentially outdated concept, the paper 
stubbornly persists. 

Indeed, the measurement of academic achievement in terms of 
papers published in highly cited journals, such as this one, has evolved 
into a kind of fetish. The 2012 San Francisco Declaration on Research 
Assessment, in which almost everyone in science declared their mutual 
abhorrence for reliance on citation data, stands, in truth, as testament to 
the manner in which such data now bestride the 


scientific world. > NATURE.COM 
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The European Commission has abandoned consideration of ‘Science 2.0’, 
finding it too ambitious. That was the wrong call, says Colin Macilwain. 


and people demand appropriate credit for their work. 

The second pillar — the peer-reviewed, single-investigator grant — 
may hang around for longer. Since its development in the middle of the 
last century, after all, the distribution of such grants have evolved as the 
most tried-and-tested method of public support for science. 

The difficulty here is growing political impatience with the prom- 
ised outcomes from this grant funding. The annual budget of the US 
National Institutes of Health, the largest disburser of such grants in the 
world, has been frozen now for more than a decade at just over $30 bil- 
lion. As a result, it is increasingly older people, who know how to work 
the system, who get funding: people under 40 are finding it harder and 
harder to get their foot on the ladder. One consequence is the steady 
drumbeat from Congress for ‘prizes’ and other gimmicks to circumvent 
peer review. 

The UK research councils are in similarly dire 
straits. The new secretary of state for business, 
Sajid Javid, who has been pressed to look for cuts 
of 25-40% in his budget, last month brought in the 
US consultancy firm McKinsey to look at how the 
research councils function. Bodies such as the UK 
Medical Research Council have a fine and well- 
deserved global reputation. But when McKinsey 
looks under the hood, it may well discover that 
the outputs are not what the politicians are after. 

What politicians want, these days, is ‘innova- 
tion’: an odd hybrid, sitting somewhere between 
science, engineering, finance and human inclina- 
tion. As everyone knows, ‘innovation thrives most 
visibly in places such as Silicon Valley. But would- 
be innovators there already occupy a culture in 
which spending several years on a PhD and then 
grinding your way to a professorship simply isn't part of the currency. 
The lifestyle changes that are under way in innovators during their most 
creative years do not align with the decade-old funding approaches used 
today by the European Commission and other major funding agencies. 

There is no easy way to alter the architecture of these funding routes. 
The divisional structure of the US National Science Foundation, for 
example, is recognized even by its most senior officials as inappropriate 
in today’s multidisciplinary world. But they can’t propose change, lest 
Congress makes a hash ofit. 

By contrast, the commission has some room for manoeuvre — and 
the luxury of long-term planning. The programme that Moedas is start- 
ing to prepare now will run from 2021 to 2027, by which time changes 
triggered by the Internet and globalization will surely be impossible to 
resist. The commission should be planning for how the world will look 
then — and envisioning Science 2.0 would be a good place to start. m 


Colin Macilwain writes about science policy from Edinburgh, UK. 
e-mail: cfmworldview@googlemail.com 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


CONSERVATION 


Poverty drives 
forest raiders 


Researchers have proposed 
ways to improve efforts to stop 
people illegally harvesting 
wood and food from a 
conservation park in Uganda. 
Bwindi Impenetrable 
National Park, which covers 
330 square kilometres, hosts 
half of the world’s endangered 
mountain gorillas. Mariel 
Harrison at Imperial College 
London and her colleagues 
surveyed 365 households 
around Bwindi and found that 
people in 26% of them had 
hunted for bushmeat and 20% 
had collected firewood in the 
park. These illegal activities 
were most prevalent among the 
poorest households living in 
remote areas and closest to the 
park. Focus groups reported 
poverty and resentment at 
the inequitable distribution 
of benefits from the park as 
reasons for their activities. 
The researchers suggest 
that projects combining 
conservation and development 
should benefit the poorest 
people in remote areas near 
park boundaries to reduce 
illegal activities that jeopardize 
the park’s resources. 
Conserv. Biol. http://doi.org/6nt 
(2015) 


NEURODEVELOPMENT 


Mouse brain cells 
made primate-like 


By turning ona single gene 

in specific neural cells in the 
embryonic mouse brain, 
researchers have made more 
neurons grow in the neocortex 
—a region that evolved to be 
much larger in primates than 
in other mammals. 

Wieland Huttner at the Max 
Planck Institute of Molecular 
Cell Biology and Genetics in 
Dresden, Germany, and his 


ZOOLOGY 


Light show lures prey 


Jellyfish and other marine animals could be using their 
fluorescent proteins to attract prey. 

Proteins such as green fluorescent protein (GFP) are 
invaluable tools for biologists in the lab, but their role in nature 
has not been clear. Steven Haddock at Monterey Bay Aquarium 
Research Institute in Moss Landing, California, and Casey 
Dunn at Brown University in Providence, Rhode Island, placed 
the flower hat jellyfish (Olindias formosus, pictured in blue 
light) in a tank along with its rockfish prey and separated the 
two with a transparent wall. When they exposed O. formosus 
to blue light (the light of its underwater habitat), the tips of 
its tentacles fluoresced green and the rockfish attacked the 
barrier more often than under yellow or white light or when the 
jellyfish was replaced with a non-fluorescent decoy. 

Herbivorous prey may be seeing the fluorescence as an 
indicator of chlorophyll, which also fluoresces. 


Biol. Open http://doi.org/6qd (2015) 


team developed a mouse model 
in which they could switch on 
the Pax6 gene in specific neural 
progenitor cells — where it 

is expressed in humans but 

not in mice. They turned on 
the gene in cells that give rise 
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to neurons of the neocortex, 
which controls advanced 
cognitive abilities. The team 
found that with sustained Pax6 
expression, the progenitor 

cells proliferated more, 
resulting in more neurons in 


parts of the neocortex. 

Paxé could have had 
an important role in the 
evolution of the larger primate 
neocortex, the authors suggest. 
PLoS Biol. 13, €1002217 (2015) 


Stars align to 
show new planet 


Two teams using different 
telescopes have confirmed that 
a planet with a mass similar 

to that of Uranus is orbiting a 
distant star. 

Most known exoplanets 
orbit close to their stars, but in 
2005 researchers using an effect 
called microlensing spotted a 
planet with a larger orbit. This 
effect happens when two stars 
align: the gravity of the star 
in front magnifies light from 
the one behind. Planets in the 
foreground system can alter 
this light, which allows them to 
be detected. 

David Bennett at the 
University of Notre Dame in 
Indiana and his colleagues 
used NASA’s Hubble Space 
Telescope to study the light 
from the microlensing event 
OGLE-2005-BLG-169 more 
precisely. Their observations 
indicated the presence ofa 
planet roughly 14 times heavier 
than Earth and more than 
3 times farther from its star. 

Another team that included 
Bennett, led by Virginie 
Batista of the Paris Institute 
of Astrophysics, used the 
W. M. Keck Observatory in 
Hawaii and found similar 
properties for the planet. 
Astrophys. J. 808, 169; 

Astrophys. J. 808, 170 (2015) 


How the Ebola 
vaccine protects 


The Ebola vaccine that proved 
effective in a trial of more than 
4,000 people in Guinea seems 
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CARLOS JARED/BUTANTAN INST. 


NARENDRA SHRESTHA/EPA/CORBIS 


to work by rapidly triggering 
one arm of the immune system 
to hold back the virus while 
the body ramps up antibody 
production, according to a 
study in monkeys. 

Heinz Feldmann of the 
National Institute of Allergy 
and Infectious Diseases in 
Hamilton, Montana, and 
his colleagues tested the 
VSV-EBOV vaccine, which 
was designed to fight the 
2014 West African outbreak 
strain of Ebola virus. The 
team immunized 15 rhesus 
macaques (Macaca mulatta) 
and then infected them with 
the virus. All but one of the 
vaccinated animals survived, 
whereas all unimmunized 
animals died about a week 
after infection. 

Analysis of the surviving 
animals’ blood showed that the 
vaccine triggered the innate 
immune system to keep viral 
replication in check during 
the first days of infection, 
giving the rest of the immune 
system time to churn out 
Ebola-specific antibodies. 
Science http://doi.org/6p9 (2015) 


Why Nepal quake 
was so damaging 


The magnitude-7.8 earthquake 
that devastated much of Nepal 
on 25 April did not relieve all 
of the geological stress in the 
region — making another big 
quake probable. 

A team led by Jean-Philippe 
Avouac at the University of 


Cambridge, UK, used seismic 
data and satellite radar to 
show that a 140-kilometre 
stretch of a major Himalayan 
geological fault shifted during 
the disaster. This transferred 
stress into neighbouring areas 
of the fault, which may now be 
more prone to rupturing in a 
future quake. 

In a separate paper, Yuji 
Yagi and Ryo Okuwaki of the 
University of Tsukuba, Japan, 
found that the earthquake 
rupture raced eastward 
from its point of origin. 

The greatest movement of 
the fault occurred about 

50 kilometres east of the 
quake’s epicentre — close to 
Kathmandu. 

This discovery helps to 
explain why the shaking was 
so destructive to the city, 
according to a third paper by 
another team led by Avouac. 
Although much of the ground 
in the region shook only 
moderately, the seismic energy 
was amplified across the 
Kathmandu basin in ways that 
caused tall buildings, including 
temples, to sway and collapse 
(pictured is what remains of 
Kathmandu’s Bhimsen Tower, 
known as Dharahara). 

Nature Geosci. http://doi.org/6p7 
(2015); Geophys. Res. Lett. http:// 
doi.org/6ns (2015); Science 
http://doi.org/6p6 (2015) 


CELL BIOLOGY 


Chemicals switch 
cells’ identity 


Adult skin cells have been 
transformed directly into 
neurons by two independent 
groups in China using just 
small-molecule chemicals. 
Reprogramming adult 
cells back into stem cells or 
directly into other types of 
specialized cells requires 
transcription factors, which 
modify cells genetically. To 
avoid tinkering with the cells’ 
genes, Gang Pei and Jian Zhao 
from the Shanghai Institutes 
for Biological Sciences and 
their colleagues worked with 
fibroblasts, or skin cells, from 
both healthy adults and people 
with Alzheimer’s disease, 
culturing them with a cocktail 


RESEARCH HIGHLIGHTS MiiiSaiaa¢ 


SOCIAL SELECTION 


Popular topics 
on social media 


Bioethics comes under fire 


The latest biomedical technologies, from fetal stem cells 
to human gene editing, offer huge potential for treating 
disease. They also raise tricky ethical questions that can 
eventually result in guidelines on how to prevent their 
misuse. In an opinion piece in The Boston Globe, Harvard 
University psychologist Steven Pinker argues that this 
sweeping ethical oversight delays innovation and should ‘get 
out of the way (go.nature.com/93t5ti). The article ignited 
much discussion on social media among bioethicists and 
researchers. Many disagreed with Pinker, including Daniel 
Sokol, a London-based bioethicist and lawyer, who wrote in 
a blog post that ethicists should at times ‘get in the way’ (see 
go.nature.com/zmluki). Research to 
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of small molecules to produce 
neurons. 

Hongkui Deng at Peking 
University, Beijing, and his 
colleagues used a different set 
of chemicals to convert mouse 
fibroblasts into neurons. 

Both groups made neurons 
that looked, fired and made 
functional connections just 
like neurons created from 
fibroblasts using transcription 
factors. 

The chemicals modulate key 
molecular signalling pathways 
to change an adult cell’s 
identity. This approach could 
make it easier to reprogram 
cells for clinical use, say the 
authors. 

Cell Stem Cell http://doi.org/6p4; 
Cell Stem Cell http://doi.org/6p5 
(2015) 


Venomous frogs 
headbutt foe 


Two Brazilian frog species use 
sharp spines protruding from 
around their noses and mouths 
to deliver toxins in their skin to 
predators — the first evidence 
ofa venomous frog. 

Most frogs produce toxins 
in their skin but have no way 
of deliberately passing them 
on to predators. Edmund 
Brodie at Utah State University 


alleviate human suffering is important, 
he added, but “misguided attempts 

to help can — and have — led to 
incalculable harm’. 


in Logan and his team 
discovered this toxin delivery 
when they were collecting 
specimens of two tree-frog 
species (Corythomantis 
greeningi and Aparasphenodon 
brunoi) and restraining them 
in their hands. C. greeningi 
(skull, pictured) jabbed its 
spiny head into the offending 
hand and released toxins 
from its skin glands, causing 
intense pain in the arm for 
several hours. When tested in 
mice, the venom from both 
frog species caused swelling 
and was deadly at high 
concentrations. 

There could be more 
venomous amphibians than 
thought, the authors say. 

Curr. Biol. http://doi.org/6n7 
(2015) 
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SEVEN DAY 


Virus lab for Japan 


Japan has cleared the way 

for its first lab to handle the 
highest-risk pathogens, such as 
Ebola virus, at a facility about 
30 kilometres west of Tokyo. 
The National Institute of 
Infectious Diseases originally 
built the facility in Musashi- 
Murayama in 1981 to operate 
at the greatest biosafety level, 
BSL-4. But opposition from 
local residents has forced 

it to run as a BSL-3 lab. On 

3 August, the country’s health 
minister reached an agreement 
with the mayor of Musashi- 
Murayama that would allow 
BSL-4 operations to begin. 


PEOPLE 


Energy director 

US President Barack Obama 
will nominate physicist 
Cherry Murray as the director 
of the US Department of 
Energy’s Office of Science, 

the White House announced 
on 5 August. The position 

has been vacant since 2013. 
Murray, now at Harvard 
University in Cambridge, 
Massachusetts, was principal 
associate director for science 
and technology at California's 
Lawrence Livermore National 
Laboratory from 2007 to 
2009. She also worked at Bell 
Laboratories in Murray Hill, 
New Jersey, from 1978 to 2004. 
At the energy department, 
Murray will oversee a 
US$5-billion research budget. 


POLICY 


Interrogation ban 
The American Psychological 
Association (APA) has decided 
to ban psychologists from 
participating in military 
interrogations, responding to 

a damning report implicating 
the organization in the torture 
of detainees by US military and 
intelligence agencies. At the 


The news in brief 


Moon snapped from its dark side 


NASAs DSCOVR satellite captured this image 
of the Moon crossing in front of Earth from 

a vantage point 1.6 million kilometres away. 
DSCOVR monitors space weather from the 
gravitationally stable L1 point. The far side of 
the Moon has fewer dark lava flows — or maria 


— than the side that faces Earth. The most 
prominent flow (dark blotch, upper left) is Mare 
Moscoviense, named by the Soviet Union after its 
probe Luna 3 took the first photos of the far side 
of the Moon in 1959. The DSCOVR photo was 


organization's annual meeting 
in Toronto on 7 August, APA's 
council of representatives 
voted to approve the measure 
by 156 to 1, with 7 abstentions. 
The APA parted company with 
several senior officials after the 
report was released, and it says 
that it will convene a panel to 
review its ethics guidelines. 


Scots say no to GM 
Scotland has declared its 
intention to opt out of 
growing genetically modified 
(GM) crops approved by 
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the European Union (EU) 

on non-scientific grounds 

— the first region to do so 
under new EU rules. Rural 
affairs secretary Richard 
Lochhead said on 8 August 
that Scotland wished to 
protect its ‘clean, green’ status. 
The rules were introduced in 
April to overcome a political 
impasse in which EU member 
states that are divided on the 
principle of GM crops have 
blocked approvals for safety- 
cleared crops. Countries have 
until 3 October to opt out of 


released on 5 August. 


the varieties currently being 
assessed by the European 
Commission for cultivation. 


Australia emissions 


Australia’s government plans 
to lower national greenhouse- 
gas emissions by 26-28% 
below 2005 levels by 2030, 

it announced on 11 August. 

It is the latest country to 

state its commitment ahead 
of a United Nations global- 
warming summit in Paris this 
December. Prime Minister 
Tony Abbott said that the 


NASA/NOAA 


targets balanced the need to 
slow climate change and to 
promote strong economic 

S growth. Australia has the 

~ highest per capita emissions 
of any of the 34 industrialized 
countries in the Organisation 
for Economic Co-operation 
and Development. 


End for Ada project 


The Ada Initiative, which has 
spent four years addressing 
sexism and harassment at 
science and technology 
conferences, announced on 

4 August that it will cease 
activities in mid-October. 

The non-profit organization 
has sought to eliminate 
environments that discriminate 
against women. It has 
encouraged conferences in 
fields including artificial life, 
physics and entomology to 
formally adopt anti-harassment 
policies. The organization said 
it decided to close after it had 
difficulty finding a suitable 
head to replace its founders. 


‘AS MARIE/AP. 


Debris discovery 


A section of an aircraft wing, 
found on an island in the 
Indian Ocean on 29 July, 

is probably from missing 
Malaysia Airlines flight 
MH370, a French prosecutor 
said on 5 August. The 
announcement came after 
investigators in Toulouse, 
France, examined the debris 


TREND WATCH 


The current rate of global 


Service reported on 30 July 
(M. Zemp et al. J. Glaciol. 
61, 745-762; 2015). The 


WORLD GLACIER MONITORING SERVICE 


team compiled more than 47,000 
observations, dating as far back as 
the sixteenth century, and found 
that glaciers are now shrinking 
nearly twice as fast as they were 
during the late twentieth century. 
Even if the climate stabilizes, ice 
loss is expected to continue in 


many areas. 


glacier retreat is “historically 
unprecedented’, researchers at 
the World Glacier Monitoring 


(pictured) and determined that 
it is from a Boeing 777 — the 
same model as the missing 
plane. Ina stronger statement, 
Malaysian Prime Minister 
Najib Razak announced 
definitively that the wing 
fragment belonged to MH370. 
The families of passengers who 
were on the flight reacted to the 
mixed messages with disbelief 
and anger. 


Iran deal backed 


A group of 29 leading 

US physicists and experts 

on nuclear weapons and 

arms control wrote a letter 

to President Barack Obama 
on 8 August supporting 

the Iran nuclear agreement 
made with the United States 
and five other countries on 

14 July. The letter’s authors 
include the chief executive of 
the American Association for 
the Advancement of Science; 
a former head of the Los 
Alamos National Laboratory 
in New Mexico; and six Nobel 
prizewinners. They say that the 
deal includes “more stringent 


constraints than any previously 
negotiated non-proliferation 
framework”. The letter is 
intended to sway undecided 
US Congress members ahead 
of a vote that could derail the 
agreement in September. 


Ticket to space 
NASA has purchased rides 
worth US$490 million aboard 
Russia's Soyuz spaceships, 
agency head Charles Bolden 
wrote to the US Congress on 
5 August. Ever since the US 
space shuttle programme 
ended in 2011, NASA has 
relied on Russia to transport 
its astronauts to and from the 
International Space Station. 
The contract extends their 
agreement to the end of 2018, 
adding to the $458 million 
that NASA is already paying 
for Soyuz flights in 2017. 

The agency is seeking private 
ways to ferry astronauts to 
space, but cuts to government 
funding of the ‘commercial- 
crew’ programme have 
hindered progress. 


Rare diseases bid 
Drug firm Shire has proposed a 
US$30-billion hostile takeover 
of its competitor Baxalta in 
Bannockburn, Illinois. Both 
companies make drugs to treat 
rare diseases. The offer from 
Dublin-based Shire was made 


GLACIERS RETREATING EVER FASTER 


The world’s glaciers have been shrinking since the 1960s, but their 
cumulative mass is now disappearing faster than ever before. 


Mass balance (metres of water equivalent) 
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SEVEN DAYS | THIS WEEK | 


13 AUGUST 

The Rosetta spacecraft, 
which is studying the 
comet 67P/Churyumov- 
Gerasimenko in detail, 
reaches perihelion — the 
closest point to the Sun 
during its journey. 


16-20 AUGUST 
Around 12,500 chemists 
gather in Boston, 
Massachusetts, for 

the 250th American 
Chemical Society 
meeting. 
go.nature.com/Sh3gxt 


16-21 AUGUST 
Geochemists meet at 

the 25th Goldschmidt 
conference in Prague. To 
mark the anniversary, 

25 lectures will highlight 
geochemical advances 
since the first conference. 
go.nature.com/hu9x5x 


public on 4 August to appeal 
directly to Baxalta shareholders 
after an initial approach was 
rebuffed privately on 10 July. 
Baxalta’s directors say that the 
offer significantly undervalues 
the company, which specializes 
in treatments for immune 
disorders and rare blood 
conditions. 


3D-printed pills 

The US Food and Drug 
Administration (FDA) 
approved the first 3D-printed 
drug, an antiepileptic 
medication called 
levetiracetam, on 3 August. 
Manufactured by Aprecia 
Pharmaceuticals in Langhorne, 
Pennsylvania, the printed drug 
is a porous pill that dissolves 

in the mouth to make it easy to 
swallow. The company expects 
the drug to arrive on the market 
in early 2016 and intends to 
produce more 3D-printed 
therapies for central nervous 
system disorders. 


> NATURE.COM 
For daily news updates see: 
Www.nature.com/news 
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Workers at the Sendai nuclear power plant conduct an emergency safety drill ahead of the restart that ended Japan’s two-year nuclear freeze. 


ENERGY POLICY 


Japan ends nuclear hiatus 


Return to nuclear energy will reduce carbon emissions but not by nearly enough. 


BY DAVIDE CASTELVECCHI 


‘| vs Sendai Nuclear Power Plant on the 
island of Kyushu broke a four-year lull 
on 11 August when it switched one of its 

reactors back on. The restart is the first since 

Japan’s nuclear-power industry ground to a 

halt two years ago following safety concerns 

in the wake of the 2011 Fukushima Daiichi 
disaster. 

It will help the world’s third-largest economy 
to lower its carbon emissions. But the govern- 
ment energy plan that includes this shift in pol- 
icy is much too modest if Japan is to help keep 
global temperatures from rising by more than 
2°C above pre-industrial levels, say analysts. 

The plan essentially returns the nation to 
its pre-Fukushima energy mix of mainly coal 


and nuclear power, apart from a small but sub- 
stantial increase in solar. “The mindset of the 
government and the heavy industry is still the 
same: to try to keep nuclear and also coal,” says 
Tetsunari lida, head of the Institute for Sustain- 
able Energy Policies in Tokyo. 

Before the disaster, Japan had aimed to 
produce about half of its electricity from nuclear 
sources. Following the meltdowns, the short- 
lived cabinet of then prime minister Yoshihiko 
Noda considered phasing out nuclear energy 
entirely (see Nature 486, 15; 2012) and replac- 
ing it with renewables and fossil fuels. 

However, the current government of Prime 
Minister Shinzo Abe, which took over in 
2012, has put nuclear back into the picture, 
with plans to restart as many reactors as pos- 
sible (see Nature 507, 16-17; 2014). In July, the 


government submitted its targets for reduc- 
ing greenhouse-gas emissions to the United 
Nations, ahead of a UN global-warming sum- 
mit in Paris this December. The pledge included 
a goal for nuclear energy to fulfil at least 20% 
of Japan's electricity needs by 2030. Renewable 
sources — mostly hydropower but also solar — 
would contribute a minimum of 22%. 

This would reduce Japan's carbon footprint 
compared with the years since Fukushima, 
when electricity companies bridged the nuclear 
gap by ramping up the use of coal, oil and, 
especially, liquefied natural gas. But fossil fuels 
would still account for more than half the power 
generated in 2030. Nuclear and renewables 
would help keep carbon dioxide emissions in 
check, but overall emissions would be cut by 
only 18% from 1990 levels. The European 
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> Union, by comparison, pledged 40% cuts 
from 1990. “I think that the government under- 
stands and acknowledges the climate goal and 
tries to make its target consistent with it, but 
industrial and economic criteria such as lower- 
ing electricity costs are given higher priority,’ 
says Seita Emori, who heads a climate risk- 
assessment team at Japan's National Institute for 
Environmental Studies in Tsukuba. The 2030 
emissions target “doesn’t look really sufficient 
for the climate goal”. 

The government sees an especially modest 
role for wind, projected to contribute only 1.7% 
of electricity generation by 2030. (Germany, 
by comparison, already derives around 8-9% 
of its power from wind.) lida says there is an 
“irrational bias” against wind that is deep- 
rooted in Japan's energy industry. 

Moreover, the way Japan’s energy market 


is structured, with a few de facto regional 
monopolies, is stacked against wind, favour- 
ing instead sources that are established, such 
as nuclear and fossil fuels. “Power com- 
panies control both the grid and existing 
power plants,’ says Tomas Kaberger, head of 
the Tokyo-based Japan Renewable Energy 
Foundation. Wind would take a share of the 
market away from the utilities’ power plants, 
but the same utilities could deny wind- 
power companies access to the grid, says 
Ali Izadi-Najafabadi, who heads the Tokyo 
office of the consulting company Bloomberg 
New Energy Finance. The utilities must cite 
“technical grounds” for such a refusal, but 
“there is no independent grid operator, so it’s 
hard to judge those technical grounds’; he says. 

To switch back on, the Sendai plant had to 
satisfy increased scrutiny from regulators and 


the courts. Following the 2011 meltdowns, the 
Japanese government overhauled its nuclear 
safety policy, reviewed its atomic-energy infra- 
structure and created the independent Nuclear 
Regulation Authority (NRA). Izadi-Najafabadi 
says that the NRA showed that it can bite as well 
as bark when it forced utilities to decommission 
some of their more troublesome reactors. Still, 
anti-nuclear advocates complain that reforms 
have not gone far enough, in particular on eval- 
uating seismic and volcanic risks and preparing 
evacuation plans, and that the NRA has bowed 
to political pressure to speed up its reviews. 

Nuclear-safety culture has made progress 
since Fukushima, says Amory Lovins, co- 
founder of the Rocky Mountain Institute, an 
energy think tank in Snowmass, Colorado. 
But, he adds, “there is still a troublesome and 
pervasive lack of transparency”. m 


DEVELOPMENT 


Flagship aid programme 
up for evaluation 


The Millennium Villages Project in Africa begins analysis of first ten years to test impact. 


BY JEFF TOLLEFSON 


he Millennium Villages Project (MVP) 
[sees out among development efforts 

in Africa. Since its launch in 2004, it 
has attracted generous donations and high- 
wattage supporters — including Hollywood 
actor Angelina Jolie and United Nations sec- 
retary-general Ban Ki-moon — for its work on 
alleviating poverty in rural Africa. The pro- 
gramme has delivered aid to at least 500,000 
people in 10 countries, and has been emulated 
in others. 

But its effectiveness has never been thor- 
oughly tested. With the publication of a 
research plan in The Lancet last month (see 
go.nature.com/3eidfr), the MVP has now 
embarked on its first comprehensive evalua- 
tion in the hope of addressing long-standing 
questions about its impact. 

Led jointly by the Earth Institute at Columbia 
University and the non-profit organization 
Millennium Promise, both in New York, the 
MVP aims to lift clusters of villages out of 
poverty through interventions ranging from 
building health centres, roads and schools to 
improving agriculture and sanitation. It began 
in Kenya in 2004, and as of 2013 had an annual 
budget of about US$25 million. 

“We will, I believe, be able to gain very 
useful insights about costs, processes, technol- 
ogies, information systems, local and national 


Building water-delivery systems is one measure used by the Millennium Villages Project in Africa. 


governance, among other issues,” says Jeffrey 
Sachs, who runs the Earth Institute and who 
conceived the programme. 

As economists increasingly advocate ran- 
domized controlled trials of international aid 
programmes (see page 150), Sachs has faced 
criticism for not setting up the MVP asa rig- 
orous experiment. MVP researchers are now 
trying retroactively to compare villages that 
received the full intervention with similar ones 
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that did not, but the research protocol readily 
acknowledges challenges in collecting data and 
producing statistically significant results. 

“T expect that the authors will conclude that, 
although we cannot prove that MVP works, we 
also cannot rule out that it works,’ says Annette 
Brown, who heads the Washington DC office 
of the International Initiative for Impact Evalu- 
ation, a non-profit organization that funds and 
analyses such evaluations. 


GUILLAUME BONN/CORBIS 


In March, preliminary results from a 
study commissioned by the UK Depart- 
ment for International Development 
(DFID) in 2011 found little benefit from an 
£11.5-million ($18-million) expansion of 
the Millennium Villages project in north- 
ern Ghana. Sachs asserts that effects are 
hard to see at the Ghana site because it is in 
its early stages; his critics see the analysis as 
further evidence that the Millennium Vil- 
lages approach may not work as advertised. 
“The trumpeting of the project as a model 
is just indescribably disproportionate to the 
deafening silence about its actual results,” 
says Michael Clemens, a senior fellow at 
the Center for Global Development, a non- 
profit think tank in Washington DC. 

Clemens has long been an outspoken 
critic of the MVP and was among research- 
ers who challenged’ a 2012 study in The 
Lancet’ that reported that child mortal- 
ity had dropped in Millennium Villages 
three times faster than elsewhere in the 
host nations. The challenge ultimately led 
to a retraction of that claim by the paper's 
lead author. Clemens argues that aid money 
should be spent either on projects that 
generate useful knowledge or on things 
that have been shown to work, noting that 
malaria bednets, which have a demonstrated 
benefit and are part of the MVP’s suite of 
interventions, cost $15-20 per household. 

The MVP typically budgets $120 per 
capita annually, according to its website, 
although Sachs says that outside contribu- 
tions can reduce MVP’s investment to half 
that. At the Ghana site funded by DFID, the 
total investment by all parties was projected 
to be $27.1 million over 5 years for 30,000 
people. That is $181 per person annually, 
or about $4,500 per household over the 
course of the project — less than the $5,408 
per household calculated by a randomized 
controlled trial in Ghana testing a two-year 
package of interventions that included food, 
cash, health services and training’. 

The MVP hopes to release its analysis 
by the end of 2016, and Sachs says that 
his team will be in a better position to 
talk about cost-effectiveness and other 
considerations once the analysis is out. 

Dean Karlan, an economist at Yale 
University in New Haven, Connecticut, 
says that it is probably too late for the pro- 
ject itself to advance the science of global 
development in a significant way, but 
he credits Sachs with raising awareness 
about global poverty issues. “I do see it as 
a missed opportunity,’ Karlan says, “but in 
the grand scheme of things there are tons 
of missed opportunities.” m SEE EDITORIAL P.135 
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Anti-GM group expands 
probe into industry ties 


Activists seek release of records from 40 researchers at 


US public universities. 


BY KEITH KLOOR 


ichelle McGuire, a nutrition scientist 
M at Washington State University in 

Pullman, was stunned last month 
when activists who oppose the use of geneti- 
cally modified (GM) organisms asked to read 
her e-mail. 

US Right to Know of Oakland, California, 
filed a request under Washington's freedom- 
of-information law to see her correspondence 
with, or about, 36 organizations and compa- 
nies. McGuire is one of 40 US researchers who 
have now been targeted by the group, which 
is probing what it sees as collusion between 
the agricultural biotechnology industry and 
academics who study science, economics and 
communication. 

That investigation, which began in February, 
has just started to yield documents. These 
include roughly 4,600 pages of e-mails and 
other records from Kevin Folta, a plant 
scientist at the Uni- 


versity of Florida in “Part of . 
Gainesville and a democracy is 
well-known advo- that we get to 
cate of GM organ- know what 
isms. The records, our public 


which the university employees do.” 
gave to US Right to 

Know last month, do not suggest scientific 
misconduct or wrongdoing by Folta. But they 
do reveal his close ties to the agriculture giant 
Monsanto, of St Louis, Missouri, and other 
biotechnology-industry interests. 

The documents show that Monsanto 
reimbursed Folta for trips he took to speak 
to US students, farmers, politicians and the 
media. Other industry contacts occasionally 
sent him suggested responses to common 
questions about GM organisms. 

“Nobody ever told me what to say,” says 
Folta, who considers public outreach to be 
a key part of his job. “There's nothing I have 
ever said or done that is not consistent with 
the science.” 

He adds that he has never accepted 
honoraria for outreach work, and that the 
University of Florida does not require him 
to disclose travel reimbursements. But the 
e-mails show that Folta did receive an unre- 
stricted US$25,000 grant last year from Mon- 
santo, which noted that the money “may be 


used at your discretion in support of your 
research and outreach projects”. Folta says 
that the funds are earmarked for a proposed 
University of Florida programme on commu- 
nicating biotechnology. 

Monsanto spokeswoman Charla Lord says 
that the company was “happy to support Dr 
Folta’s proposal for an outreach programme 
to increase understanding of biotechnology’, 
and that the $25,000 grant “predominately cov- 
ered travel expenses”. Lord adds that Monsanto 
considers public-private collaborations to be 
“essential to the advancement of science”. 

Such explanations do not satisfy Gary 
Ruskin, executive director of US Right to 
Know. “I think it’s important for professors 
who take money from industry to disclose it,” 
he says. “And if they’re not disclosing it, that’s 
a problem. And if they say they arent taking 
money, and they are, then that’s a problem” 

Ruskin’s group, which was founded in 
2014, calls for mandatory labelling of food 
that contains GM ingredients — even though 
numerous scientific bodies, including the US 
National Academy of Sciences, have found 
no evidence that such food harms human 
health. 

US Right to Know launched its investigation 
of academic researchers after it noticed that 
several had fielded questions about crop bio- 
technology on a website called GMO Answers, 
which is funded by members of the biotech 
industry. The group considers the site, which 
is aimed at consumers and managed by public- 
relations firm Ketchum of New York, to bea 
“straight-up marketing tool to spin GMOs in 
a positive light”. It is now seeking the records of 
public-sector researchers — who are subject to 
state freedom-of-information laws — to con- 
firm its suspicions. 

Ruskin says that the group has received 
responses to about 10% of its records requests. 
At least one institution, the University of 
Nebraska, has refused to provide documents 
requested by the group. 

US Right to Know argues that its requests 
are reasonable, because the researchers who 
are under scrutiny are public employees. “Part 
of democracy is that we get to know what our 
public employees do,” says Ruskin. 

McGuire is not sure why the group is seek- 
ing her records, because she has not contrib- 
uted to the GMO Answers website. Some > 
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of her recent research refutes claims that 
glyphosate, a herbicide often used on GM 
crops, accumulates in breast milk; the work 
relied on an assay developed with assistance 
from Monsanto. Still, says McGuire, “I'ma 
milk-lactation researcher.” 

But Folta’s e-mails show him to be a 
frequent contributor to GMO Answers. 
Ketchum employees repeatedly asked him 
to respond to common questions posed by 
biotechnology critics. In some cases, they 
even drafted answers for him. “We want 
your responses to be authentically yours,” 
one Ketchum representative wrote in a mes- 
sage on 5 July 2013. “Please feel free to edit 
or draft all-new responses.” 

“They thought they could save me time 
by providing canned answers,’ Folta says 
of his “extremely annoying” Ketchum 
contacts. “And I dont know if I used them, 
modified them or what, but they stopped 
doing it at some point.’ He adds that the 
correspondence obtained by US Right to 
Know reveals only a fraction of his work as 
a scientist, and taken alone does not paint 
an accurate picture of his work. 

Bruce Chassy, a toxicologist at the 
University of Illinois at Urbana-Champaign 
who is the subject of two freedom-of- 
information requests by US Right to Know, 
says that his e-mails would reveal a similar 
portrait of “people trying to defend the 
science against malicious attacks”. 

But Chassy acknowledges the ethical 
questions raised by close relationships 
between the biotech industry and the pub- 
lic sector. “Are we working for them, or are 
they working for us?” he asks. “Probably a 
little bit of both” — in part because univer- 
sities and companies often have overlap- 
ping research interests. US Right to Know 
aims to reveal this overlap in full. 

Michael Halpern, an expert on scientific 
integrity at the Union of Concerned Sci- 
entists in Washington DC, says that Folta’s 
case suggests that universities should do 
more to educate researchers on what consti- 
tutes a conflict of interest and what types of 
financial relationship should be disclosed. 

“It behooves scientists to disclose their 
funding sources so there’s no perception 
of inappropriate influence,’ says Halpern. 
“But that doesn't mean all private money is 
tainted or suspect.” m 
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VUES, WER ee 


Samples from the Ebola epidemic in West Africa are held by public-health agencies in the region and abroad. 


ank planned for 


Ebola samples 


International public-health officials discuss how to maximize 
research benefits of a widely dispersed collection. 


BY ERIKA CHECK HAYDEN 


s West Africa’s Ebola outbreak winds 
Aw an effort is under way to make 

the best use of the tens of thousands 
of patient samples collected by public- 
health agencies fighting the epidemic. On 
6-7 August, the World Health Organization 
(WHO) convened a meeting in Freetown, 
Sierra Leone, to discuss how to establish a 
biobank for up to 100,000 samples of blood, 
semen, urine and breast milk from confirmed 
and suspected Ebola patients, as well as swabs 
taken from the bodies of people who died from 
the virus. Held by health agencies in both West 


Africa and the West, the samples could be 
valuable in understanding how the current 
Ebola crisis evolved, preparing for future out- 
breaks and developing public-health research 
capacity in a region that depends on outside 
experts. 

“There are many, many ways that this 
resource could be precious,” says Cathy Roth, 
an adviser to the WHO directorate in Geneva, 
Switzerland, which arranged the meeting as 
part of a series of international discussions 
about the creation of an Ebola biobank. One of 
the difficulties is that there is no blueprint for 
how such a biobank would work, so countries 
have not yet committed to joining it. 
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One proposal has been to link existing 
collections in an online biobank with a ref- 
erence laboratory in Africa that would hold 
certain samples — for instance, collections 
taken from notable groups of patients, or from 
people who were followed especially closely 
throughout the course of their disease. Such 
a facility would be a first for the region; there 
is currently no high-containment lab in the 
Ebola zone that is suitable for studies of live, 
highly dangerous viruses. 

Although the samples vastly outnumber 
those collected in previous outbreaks, they 
are still a finite resource. Ongoing discussions 
will need to grapple with who decides what 
the samples can be used for and what kinds of 
research should be emphasized. 

“We want to have defined research priori- 
ties, because these samples do represent an 
exhaustible resource,” says Ethan Guillen, 
project manager of the Ebola initiative at 
Médecins Sans Frontiéres (MSF) in Geneva, 
which is advocating for the biobanking project. 

Assigning control over these decisions 
to the three countries where the majority of 
Ebola cases occurred — Guinea, Liberia and 
Sierra Leone — is a major priority for MSF. 
Historically, much research on viral haemor- 
rhagic fevers such as Ebola has been done by 
scientists in developed countries using samples 
taken from developing countries. Guillen sees 


that as part of the reason why, 40 years after 
Ebola was first documented in Africa, there 
is not enough public-health capacity in some 
countries to contain the disease or effective 
tools to treat or prevent it. 

Guillen says that affected countries need a 
system in which “their scientists have a say in 
what happens and can learn from the experi- 
ences, so there doesn't have to be such a reli- 
ance on outside actors”. 

Already, thousands of samples have been 
shipped out of Africa by foreign govern- 

ment agencies that 


“Wewanttohave stepped in to test 
definedresearch patients for Ebola 
priorities, during the outbreak. 
because these Several, including 
samples do the US Centers for 
represent an Disease Control and 
exhaustible Prevention (CDC) 
resource.” in Atlanta, Georgia, 


the European Mobile 
Laboratory Project and the Pasteur Institute in 
Paris have expressed cautious support for the 
biobanking idea. 

“CDC is supportive of the concept of Ebola 
biobanks for samples, particularly in the 
affected countries, which would offer organi- 
zations around the world access to samples 
from the largest Ebola outbreak in history,” 
the agency told Nature. 
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Only the European mobile lab provided 
detailed information on how many samples it 
has — about 3,000 in a high-containment lab 
in Hamburg, Germany, says virologist Stephan 
Giinther of the Bernhard Nocht Institute for 
Tropical Medicine in Hamburg, which imple- 
ments the lab project. Giinther says that the 
European mobile lab is acting as custodian 
of the samples, which are still owned by the 
countries in which they were collected. The 
project has signed agreements with Sierra 
Leone and Guinea that guarantee access for 
researchers from those countries, he adds. 

Public Health Canada says that it has not yet 
shipped samples out of West Africa but would 
not reveal where it is holding them, citing 
“biosafety and biosecurity concerns”. 

The CDC says that it is keeping samples 
both in West Africa and the United States but 
would not state how many it holds. In Decem- 
ber, it shipped 7,000 from Sierra Leone to the 
United States (see Nature http://doi.org/6jm; 
2014). However, the agency has come under 
fire recently for major lapses in biosecurity (see 
Nature http://doi.org/6jn; 2015), and raised 
this among a number of potential hurdles to 
creating a biobank. 

Guillen says that he is hopeful that these 
issues can be worked through. “We need 
better research tools,” he says. “Hopefully we 
can move quickly to get these tools in place.” m 
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Age of the 


NEUTRING 


s researchers at CERN, Europe’s particle-physics laboratory 
near Geneva, dream of super-high-energy colliders to explore 
the Higgs boson, their counterparts in other parts of the world 
are pivoting towards a different subatomic entity: the neutrino. 
Neutrinos are more abundant than any particle other than 
photons, yet they interact so weakly with other matter that every 
second, more than 100 billion stream — mainly unnoticed — 
through every square centimetre of Earth. Once thought to be 
massless, they in fact have.a.minusculé mass and can change type as 
they travel, a bizarre and entirely unexpected feature that physicists 
do not fully understand (see ‘An unconventional particle’). Indeed, 
surprisingly little is known about the neutrino. “These are the most 
ubiquitous matter particles in the Universe that we know of, and 


probably the most mysterious,” says Nigel Lockyer, director of the long time.” 


AN UNCONVENTIONAL PARTICLE 


Flurry of 
experiments 


The detectors in China JUNO) 
and India (INO) are designed 

to untangle the relationship 
between the three mass states, 
with implications for the origins 
of the forces of nature. By 
contrast, DUNE in the United 
States and Hyper-Kamiokande 
in Japan aim to spot differences 
in how neutrinos and 
antineutrinos oscillate between 
flavours. That could solve a 
second cosmological puzzle: 
why the Universe is made up of 
matter rather than antimatter. 
All four detectors will also hunt 
for a hypothesized ‘sterile’ 
neutrino. 


BY ELIZABETH GIBNEY 
GRAPHIC BY NIGEL HAWTIN 


Fermi National Accelerator Laboratory (Fermilab) in Batavia, Illinois. 
Four unprecedented experiments look poised to change this. 
Two — one in China and one in India — already have the go-ahead, 
and plans to erect detectors in Japan and the United States are in 
the works (see ‘Where they will be detected’). Buried underground 
to prevent interference from other particles, -all four are designed to 
detect many.more neutrinos, and to probe the switching process in 
more detail, than any existing experiment. 
The results are expected to feed into some of the most 
fundamental questions in cosmology (see ‘Flurry of experiments’). 
Some of the experiments will make their own neutrinos; all will use 
any they can capture from the Sun or from supernova explosions. 
“The age of the neutrino,” Lockyer says, “could go on for a very 


NEUTRINO 
FACTORIES 


Neutrinos are everywhere, 
generated by a variety: of 
processes. 


Fusion of hydrogen nuclei 
to form helium in the Sun. 


Supernovae 


Supernovae and collisions 

between cosmic rays and o> 
air particles in Earth’s 

atmosphere: 


Particle accelerators 

smashing protons 

into a target and 

fission from the 

radioactive decay of 

elements inside 

nuclear reactors. Nuclear fission 


WHERE THEY 
WILL BE DETECTED 


Deep Underground Neutrino 
Experiment (DUNE), United States 
Status: Planned 

Cost: US$1 billion 

Will make highest-energy 

neutrinos of any experiment. 


Hyper-Kamiokande, Japan 
Status: Planned 

Cost: About $800 million 

Will be the world’s largést-neutrino 
detector — it is 25 times bigger.than 
its predecessor, Super-Kamiokande. 


Status: Construction begun 
Cost: $330. million 
Sits‘under 700 metres of rock. 


India-based Neutrino 
Observatory (INO), India 
Status: Funding approved 
Cost: $233 million 

Will be-largest experimental 
basic-science facility in India. 


BIG QUESTIONS 


What is the mass hierarchy? 


Although physicists know that neutrinos exist in 
three different mass states, which state is the 
lightest and which is the heaviest remains a 
mystery. Knowing that would help scientists to 
decide between rival theories about how the four 
forces of nature unite as a single force at high 
energies, similar to those experienced in the 
moments after the Big Bang. 


JUNO 


Will measure the rate 
at which antineutrinos 
of different energies 
created at the 
Yangjiang and Taishan 
nuclear power plants 
(53 kilometres apart) 
switch flavour to 
calculate the 
differences between 
mass states. 
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NORMAL 


INVERTED 


INO 


Will detect neutrinos 
and antineutrinos 
produced by cosmic 
rays from the other side 
of Earth. If the journey 
boosts neutrino 
switching, this implies a 
normal mass hierarchy; 
if antineutrino switching 
speeds up, the inverted 
hierarchy is likely. 


Why is there so little antimatter? 


A major puzzle is why the Universe is 
filled with matter, rather than antimatter. 
Differences in how neutrinos and 
antineutrinos oscillate between flavours 
as they travel could provide a clue. 


DUNE 


Will send neutrinos of 
different energies from 
Fermilab to the 
Sanford Underground 
Research Facility in 
South Dakota. 
Physicists will record 
differences in the way 
neutrinos and 
antineutrinos oscillate 
and how this depends 
on their energy. 


1,300 km 
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Is there a ‘sterile’ neutrino? 


Some theories propose a fourth, sterile, neutrino. 
If it exists, it would interact with matter even more 
weakly than the other flavours, and could account 
for the as-yet-undetected dark matter that is 
thought to make up 85% of all the matter in the 
Universe. If neutrinos mysteriously ‘disappear’ at 
a detector, that could be a sign that they have 
switched into sterile neutrinos. 


Hyper-Kamiokande 


Neutrinos and 
antineutrinos will travel 
from the Japan Proton 
Accelerator Research 
Complex (J-Parc) in 
Tokaimura. Particles 
will be of a single 
energy, selected to 
maximize the detection 
of flavour switching 
over the distance from 
J-Parc. 
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A new generation of 
economists is trying 
to transform global 
development policy through 
the power of randomized 
controlled trials. 


REVOLT OF Tht 


-RANDOMIST 


BY JEFF TOLLEFSON 


n 70 local health clinics run by the Indian state of Haryana, the par- 
ents of a child who starts the standard series of vaccinations can 
walk away with a free kilogram of sugar. And if the parents make 
sure that the child finishes the injections, they also get to take home 

a free litre of cooking oil. 
These simple gifts are part of massive trial testing whether rewards can 
boost the stubbornly low immunization rates for poor children in the 


region. Following the model of the randomized controlled trials (RCTs) 
that are commonly used to test the effectiveness of drugs, scientists ran- 
domly assigned clinics in the seven districts with the lowest immunization 
rates to either give the gifts or not. Initial results are expected next year. But 
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smaller-scale experiments suggest that the incentives have a good chance 
of working. Ina pilot study conducted in India and published in 2010, the 
establishment of monthly medical camps saw vaccination rates triple, and 
adding on incentives that offered families a kilogram oflentils anda set of 
plates increased completion rates by more than sixfold’. 

“We have learned something about why immunization rates are low,’ 
says Esther Duflo, an economist at the Massachusetts Institute of Tech- 
nology (MIT) in Cambridge, who was involved in the 2010 experiment 
and is working with Haryana on its latest venture. The problem is not 
necessarily that people are opposed to immunization, she says. It is that 
certain obstacles, such as lack of time or money, are making it difficult 
for them to attend the clinics. “And you can balance that difficulty with a 
little incentive,” she says. 
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This is one ofa flood of insights from researchers who are revolution- 
izing the field of economics with experiments designed to rigorously 
test how well social programmes work. Their targets range from educa- 
tion programmes to the prevention of traffic accidents. Their preferred 
method is the randomized trial. And so they have come to be known as 
the ‘randomistas. 

The randomistas have been particularly welcomed in the global 
development arena. Despite some US$16 trillion in aid having flowed 
to the developing world since the Second World War, there are little 
empirical data on whether that money improves the recipients lives (see 
page 144). The randomistas see their experiments as a way to generate 
such data and to give governments tools to promote development, relieve 
poverty and focus money on things that work. 


FEATURE 


Trials are showing that 
offering incentives can 
boost attendance at 
vaccination clinics. 


Not everyone is convinced. Sceptics argue 
that the randomistas’ focus on evaluating 
specific aid programmes can lead them to lose 
sight of things such as energy, infrastructure, 
trade and corruption — macroeconomic issues 
that are central to a country’s ability to prosper, but that are effectively 
impossible to randomize. “Development is ultimately about politics, says 
Angus Deaton, an economist at Princeton University in New Jersey. 

Nonetheless, the randomista movement is gaining momentum 
(see ‘Scale the heights’). Universities are pumping out more economics 
graduate students with experience in RCTs every year. Organizations 
ranging from the UK Department for International Development to the 
Bill & Melinda Gates Foundation in Seattle, Washington, are throwing 
their financial support behind the technique. “There are hundreds and 
hundreds of randomized trials going on, and ten years ago that just wasn't 
the case,” says economist Dean Karlan at Yale University in New Haven, 
Connecticut, who is at the forefront of the movement. “We've changed 
the conversation.” 

Demand is only rising. This September, governments will gather in 
New York under the auspices of the United Nations to approve a new set of 
Sustainable Development Goals, which are intended to guide investments 
over the coming decade. And in December, questions about financial 
aid will be high on the agenda at the UN climate summit in Paris, where 
governments expect to sign a new climate agreement that will probably 
include commitments by industrialized nations to funnel money into 
sustainable development in poorer countries. In both cases, the effective- 
ness of the programmes is likely to be a key concern. 

“This is front and centre on a lot of people’s agenda,” says Ann Mei 
Chang, who is executive director of the Global Development Lab at the 
US Agency for International Development (USAID) in Washington DC. 
“Where do we get the biggest bang for our buck?” 


PROGRESS AND OPPORTUNITIES 

RCTs have been used to test the effectiveness of social programmes at least 
since the 1960s. But the modern era began in 1997, when one of the most 
famous and influential RCTs in public policy began in Mexico. 

The experiment had its origins three years earlier, when Mexican 
President Ernesto Zedillo assumed office in the middle of an economic 
crisis and assigned economist Santiago Levy to devise a programme to 
help poor people. Sceptical of the conventional approach — subsidies 
for products such as tortillas and energy — Levy designed a system that 
would provide cash payments to poor families if they met certain require- 
ments, such as visiting health clinics and keeping their children in school. 
“And because people were very critical about what I was doing,” says Levy, 
who now leads strategic development planning at the Inter-American 
Development Bank in Washington DC, “I wanted to ensure that we had 
numbers so that we could have an informed debate.” 

As it happened, Levy had a natural control group for his experiment. 
The government was rolling out its payment programme in stages, so he 
could collect data on families in villages that were included in the initial 
roll-out, and in comparable villages that were not. Within a few years, his 
team had data suggesting that the programme, dubbed PROGRESA, was 
working remarkably well. Visitation to health clinics was 60% higher in 
participating communities than in the control group. Children in those 
communities also had a 23% reduction in illness and an 18% reduction 
in anaemia. Overnight hospital visits halved across several age ranges. 

These data helped to solidify support for the programme. Now known 
as Prospera, it covers almost all of Mexico’ poorest citizens and has 
inspired similar initiatives across Latin America and into Africa. 

“PROGRESA was one of the first major national programmes of its 
kind to get a rigorous evaluation,” says William Savedoff, who works 
on aid effectiveness and health policy at the Center for Global Develop- 
ment, a think tank in Washington DC. “Today conditional cash-transfer 
programmes are some of the most heavily evaluated programmes in the 
world, and that is I think a direct consequence of the Mexican experience,’ 

The idea of developing hard evidence to test public policies was 
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bubbling up in parallel in the United States. One of the first trials began 
in 1994 with a small initiative to analyse the effect of supplying text- 
books and uniforms as well as basic classroom improvements to a group 
of schools in Kenya. Economist Michael Kremer at Harvard University in 
Cambridge had taught in Kenya years earlier. A friend of his who worked 
for a non-profit group was initiating the programme, and Kremer sug- 
gested that the group roll it out as an experiment. “I didn't necessarily 
expect anything to come of this,” he says. 

Working with the group, Kremer collected data on students in 
14 schools, half of which received the intervention. School attendance 
increased, but test scores did not. Similar results came from an experiment 
in 1995 that involved 100 schools. That trial suggested that providing text- 
books had little effect on average test scores”, owing perhaps to language 
challenges — the textbooks were in English, which was not the native 
language for many students. Students who were already scoring higher 
than their peers, however, pulled further ahead if they had the books. 

Kremer continued to run RCTs of other programmes, but it was 
Duflo — then a student of his — who pushed the idea into the main- 
stream. Duflos 1999 dissertation looked in part at an education initiative 
in Indonesia that had built 61,000 primary schools over 6 years in the 
1970s. She wanted to test acommon concern that such a rapid expansion 
would lead to a decline in the quality of education, thereby offsetting any 
gains. Running an experiment was impossible, but Duflo was able to use 
data on the differences across regions to show that the programme had, 
in fact, increased educational opportunities as well as wages. 

This and other early work inspired Duflo to look at RCTs as a way to 
generate data and definitively measure the effectiveness of policies and 
programmes. “As soon as I had a longer time horizon and some money I 
started working on setting some up,’ she says. 

One of Duflo’s early papers’, published in 2004, capitalized ona 1993 
amendment to India’s constitution that devolved more power over pub- 
lic investments to local councils and reserved the leadership of one-third 
of those councils, to be chosen at random, for women. Duflo realized 
that this effectively created a RCT that could test the effect of having 
women-led councils. In analysing the data, she found that councils led 
by women boosted political engagement by other women and directed 
investment towards issues raised by them. In some areas, women are in 
charge of obtaining drinking water, 
for instance, and councils led by 
women typically invested more in 
water infrastructure than did those 
run by men. “The scale of the pol- 
icy and the topic were at the time 
unusual,” Duflo says. “It gave mea 
sense of the range of things that the 
tool could possibly cover.” 

By the early 2000s, the randomistas 
were on the upswing. In 2002, Karlan, one of Duflo’s students, joined 
with her and other researchers to form Development Innovations — now 
known as Innovations for Poverty Action — in New Haven. The follow- 
ing year, Duflo co-founded what is now known as the Abdul Latif Jameel 
Poverty Action Lab (J-PAL) in Cambridge with fellow MIT economists 
Abhijit Banerjee and Sendhil Mullainathan. 

The work quickly expanded, and J-PAL has now run nearly 600 evalua- 
tions in 62 countries, and trained more than 6,600 people. One of Duflo’s 
latest projects will revisit her dissertation on education in Indonesia, only 
this time with secondary schools and randomized control groups. “We 
will have a randomized version of a paper on the benefits to education 
soon I hope,’ Duflo says. 


VENTURE CAPITAL 

One enthusiastic convert to the randomista philosophy is Rajiv Shah, 
a Gates Foundation official who became head of USAID in 2010. Once 
there he created a fund called Development Innovation Ventures 
(DIV) to test and scale up solutions to development problems, and he 
enlisted Kremer as its scientific director. The goal, Shah said, was to 
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“THE FAD NOW IS LET'S 
PILOT IT, AND IF IT WORKS 
WE'LL TAKE IT TO SCALE.” 


“move development into a new realm” through the use of evidence. 

Since then DIV has invested in more than 100 development projects, 
and nearly half involve RCTs. One, conducted in Kenya by a pair of 
researchers from Georgetown University in Washington DC, tested 
a simple method for reducing traffic accidents that involve mini- 
buses — collisions that Kremer calls major and increasing killers. “Two 
of them crash into each other, and 40 people die,” he says. 

In 2008, the researchers worked with more than 1,000 drivers to 
place stickers on buses that urged passengers to speak up about reckless 
driving’. They then collected information from four major insurance 
companies and found that claims for serious accidents had dropped 
by 50% on buses with stickers compared with those without. DIV 
provided a grant to conduct a larger trial — which found that claims 
dropped by 25-33% — and a second grant of nearly $3 million to help 
to scale up the project throughout Kenya. 

“The really big win is when developing countries, or firms or NGOs 
[non-governmental organizations] change their policies,’ Kremer says. 
But one question now facing DIV is whether such a strategy — or 
indeed any project that proves effective in one setting — can be repack- 
aged and deployed in other countries, where different cultural factors 
are at play (see Nature 523, 516-518; 2015). 


SCALE UP 

Effecting policy change is the precise aim of the Global Innovation 
Fund, which was launched in September 2014 with $200 million over 
5 years from the UK Department for International Development, 
USAID and others, and which follows the DIV model of rigorous test- 
ing. Interim director Jeffrey Brown, who is on loan from USAID, says 
that the fund has already received more than 1,800 applications for 
projects in 110 different countries and will be announcing its first suite 
of grants later this year. “We are essentially trying to become a bridge 
over the valley of death for good development ideas,” he says. 

But such organizations still provide only a tiny fraction of the bil- 
lions of dollars that are spent each year on development aid, let alone 
the trillions of dollars that are spent by governments on domestic 
social programmes. Even at lending institutions that have taken this 
evidence-based framework on board, the portion of investments that 
is covered by rigorous evaluations 
is small. 

At the World Bank, which started 
a Development Impact Evaluation 
division in 2005, the number of pro- 
jects receiving formal impact evalu- 
ations — through RCTs or other 
means — rose from fewer than 20 in 
2003 to 193 in 2014, mostly covering 
things such as agriculture, health 
and education. But that still represents just 15% of the bank's projects, 
says evaluation-division head Arianna Legovini, who leads a team of 
23 full-time staff and has an annual budget of roughly $18 million. 
Although many of these evaluations more than pay for themselves over 
the long term, one constraint is the up-front cost: the average price 
of an impact evaluation is around $500,000. “If I did not have donor 
funding,’ she says, “these studies just would not happen.” 

The World Bank is trying to make the most of its resources by work- 
ing directly with developing countries on implementation. More than 
3,000 people have attended its workshops and training sessions since 
2005, most of whom were government officials in developing countries 
that are receiving funds from the bank. 

The bank is also making efforts to assess the impact-evaluation 
programme itself — although the analysis is based largely on whether 
payments for projects are made on time as a proxy for implementa- 
tion of the initiatives. An analysis by Legovini and two of her team 
suggests that development projects that undergo a formal impact 
analysis are more likely to be implemented on time than are those 
that do not have evaluations, probably because of the extra attention 
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SCALE THE HEIGHTS 


The growing influence of the randomized controlled trial in economic spheres 
can be seen in the number of studies published each year. Most of the 
increase is in four sectors — although many studies overlap. 


400 ain : 
@ Health, nutrition and population 


Total all sectors 
300 


200 


Studies published 


@ Education 


Studies published 
nO 
[o} 
iS} 


™ Social protection 


Studies published 
nO 
[o} 
iS) 


@ Agriculture and rural development 


Studies published 
nO 
io} 
S) 


2000 2002 2004 2006 2008 2010 2012 


that is given to initial set-up, roll-out and monitoring’. 

This finding is good news for individual projects, but it is also a 
potential thorn in the side of many RCTs. Positive effects seen in a 
trial setting may disappear when the programme is scaled up, gov- 
ernments take over and all the extra attention disappears (see Nature 
523, 146-148; 2015). 

“The fad now is let’s pilot it, and if it works we'll take it to scale,” says 
Annette Brown, who heads the Washington DC office of the Inter- 
national Initiative for Impact Evaluation, an organization that funds 
impact evaluations as well as meta-analyses of existing studies. Brown 
says that researchers and governments should probably conduct rig- 
orous studies when any programme is scaled up to ensure that the 
results continue to hold true — just as the government in Haryana 
is doing now. 


RANDOMIZATION BIAS 

From a political perspective, the strongest argument in favour of 
well constructed RCTs — that they do not lie — may also be the 
biggest factor working against them. Local politicians often want 
to cut ribbons and release money into communities, whereas inter- 
national donors, including governments and NGOs, want flagship 
programmes that show how they are improving the world. They do 
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not welcome results showing that initiatives are not working. Even in 
Mexico, Levy says, some of the subsidies that he fought against when 
he created PROGRESA have regained political favour. 

But the randomistas have been accused of succumbing to their 
own biases. Some fear that their insistence on the RCT has skewed 
research towards smaller policy questions and given short-shrift to 
larger, macroeconomic questions. One example comes from Mar- 
tin Ravallion. An economist at Georgetown University and a former 
research director at the World Bank, he cites an antipoverty pro- 
gramme in China that received $464 million from the bank in the 
1990s. Although the programme involved road construction, hous- 
ing, education, health and even conditional cash payments for poor 
families, a study based on data collected in 2005, 4 years after dis- 
bursement ended, found minimal average impact on citizens’. “That 
was the only long-term study of integrated rural development, which 
is the most common form of development assistance,’ Ravallion says. 

Yet some families did benefit, and by combining statistics with 
economic modelling, he and his team showed that the difference lay 
in basic issues, such as education level. For Ravallion, the message is 
that aid is best targeted at the literate poor, or more broadly at issues 
such as literacy. “Governments need to know these things,” he says. 
“They can't just know about the subset of things that are amenable 
to randomization.” 

To Alexis Diamond, a former student of Duflo’s who manages 
project evaluations at the International Finance Corporation, the 
private-sector development arm of the World Bank in Washington 
DC, the debate between the randomistas and the old-guard econo- 
mists is in many ways about status and clout. The latter have spent 
their careers delving into ever more complex and abstract models, he 
says. And then “the randomistas came along and said “We don't care 
about any of that. This is about who has a seat at the table.” 

Diamond says that he tries to strike a balance at his organization, 
where most evaluations still rely on a mixture of quantitative and 
qualitative data, including expert judgement. 

Duflo shrugs off the debate and says that she is merely trying to 
provide government officials with the information — and tools — that 
they need to help them spend their money more wisely. “The best use 
of international aid money should be to generate evidence and lessons 
for national governments,” she says. 

She points to a anti-pollution programme in industrial plants in the 
Indian state of Gujarat. Partnering with a group of US researchers, the 
state ran an experiment in 2009 that divided nearly 500 plants into 
2 groups. Those in the control group continued with the conventional 
system, in which industries hire their own auditors to check compli- 
ance with pollution regulations. The others tested a scheme in which 
independent auditors were paid a fixed price from a common pool. 
The hope was that this would eliminate auditors’ fear of being black- 
balled for filing honest reports. And it did: independent auditors 
were 80% less likely to falsely give plants a passing grade, and many 
of the industrial plants covered by those audits responded by curb- 
ing their pollution. In January, regulators rolled out the programme 
across the state. 

“My hope, in a best-case scenario, is that in the next ten years you 
are going to have many, many of these projects run as a matter of 
course by governments in the spaces where they want to learn,” Duflo 
Says. m SEE EDITORIAL P.135 


Jeff Tollefson is a reporter for Nature in New York. 
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Discarded shrimp shells contain nutrients that could be used to enrich animal feed. 


Don’t waste seafood waste 


Turning cast-off shells into nitrogen-rich chemicals would benefit economies and 
the environment, say Ning Yan and XiChen. 


very year, some 6 million to 8 million 
Bem of waste crab, shrimp and 
lobster shells are produced glob- 
ally — about 1.5 million tonnes in south- 
east Asia alone’. Whereas 75% of the 
weight of a tuna fish can be extracted as 
fillets, meat accounts for only around 40% 
of a crab’s mass. 
In developing countries, waste shells 
are often just dumped in landfill or the 
sea. In developed countries, disposal can 


be costly — up to US$150 per tonne in 
Australia, for example. 

Yet shells harbour useful chemicals — 
protein, calcium carbonate and chitin, a 
polymer similar to cellulose, but which 
contains nitrogen (see ‘Shell biorefinery’). 
The potential value of such shells for the 
chemical industry is being ignored. Scien- 
tists should work out sustainable ways to 
refine crustacean shells, and governments 
and industry should invest in using this 


abundant and cheap renewable resource. 
Dried shrimp shells are valued at a mere 
$100-120 per tonne. They can be ground 
down and the powder used as an animal- 
feed supplement, bait or fertilizer, as well 
as in chitin production. The return is not 
much more for agricultural residues and 
wastes: corn stover and wheat straws, which 
are burned for heat or refined into chemi- 
cals, sell for $50-90 per tonne. 
Crustacean shells are 20-40% > 
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> protein, 20-50% calcium carbonate and 
15-40% chitin. What could these parts be 
used for? 


Protein is good for animal feeds. For 
example, Penaeus shrimp shells contain all 
the essential amino acids and have a nutrient 
value comparable to that of soya-bean meal. 
Today, the protein is not being used because 
the current processing methods destroy it. As 
livestock breeding rises rapidly, waste crus- 
tacean shells from southeast Asia could be 
transformed into protein-rich animal feed 
with an annual market value of more than 
$100 million, according to World Bank data. 


Calcium carbonate has extensive 
applications in the pharmaceutical, agri- 
cultural, construction and paper industries. 
It currently comes mainly from geological 
sources such as marble and limestone. These 
sources are plentiful but might contain heavy 
metals that are difficult to remove. Chalk 
from shells would thus be better for human 
consumption, for example as a constitu- 
ent of pills. People might also find it easier 
to accept tablets that originate from food 
sources than from rocks. 

The market price of ground calcium 
carbonate is around $60-66 per tonne for 
coarse particles, which are used in construc- 
tion, pigments, fillers and soil treatments. 
Ultrafine particles, which can be used to 
improve the properties of rubber and plas- 
tics, can reach an astonishing $14,000 per 
tonne. Even if the calcium carbonate from 
southeast Asian crustacean shells was pro- 
cessed into only the cheapest coarse parti- 
cles, it could have an annual market value of 
up to $45 million. 


Chitin is a linear polymer and the second 


most abundant natural biopolymer on 
Earth (after cellulose). It is found in fungi, 


SHELL BIOREFINERY 


plankton and the exoskeletons of insects 
and crustaceans, and organisms generate 
about 100 billion tonnes of chitin every 
year’, Currently, the polymer and its water 
soluble derivative, chitosan, are used in 
only a few niche areas of industrial chem- 
istry, such as cosmetics, textiles, water 
treatment and biomedicine. Its potential 
is much greater. 

Unlike most other forms of biomass 
such as cellulose, chitin contains 
nitrogen. Nitrogen-containing com- 
pounds — widely used in the pharmaceu- 
tical industry, carbon dioxide fixation, 
textiles and beyond — are crucial for 
modern life. For example, the nitrogen- 
containing organic compound pyrazine 
is integral to several best-selling drugs 
such as eszopi- 


clone (for sleep- “In thenext 

ing difficulties) fiveyears, a 
and varenicline multimillion- 
(to treat nicotine dollar project 
addiction). Etha- pete a 
nolamine (ETA) —J@ynched to 

7 used : power establish the 
plants for CO. first processing 


sequestration and 
in skin-friendly 
soaps, household 
cleansers and surfactants. Nitrogen- 
containing chemicals have a huge market 
— about 2 million tonnes of ETA are 
used a year globally’, with annual sales of 
around $3.5 billion. 

The industrial production of nitro- 
gen compounds involves fossil fuels and 
energy-intensive processes. First, nitro- 
gen gas must be converted into ammonia 
through the Haber process, which is noto- 
rious for its low reaction efficiency. This 
process alone accounts for an estimated 
2-3% of global energy consumption. For 
every mole of nitrogen gas consumed, 


pipeline.” 


Crustacean shells contain three primary chemicals that have many industrial uses. Developing a 
sustainable way to refine them could add billions of dollars to the bioeconomy. 


— Fractionation —+ 


4 \ 


Shell waste 


Product Use 


Pharmaceutical, agricultural, 
construction and paper industries — 
including pigments, fillers, soil 
treatments, rubber and plastics. 


Nitrogen-rich chemicals for pharma- 
ceuticals, cosmetics, textiles, water 
treatment, household cleansers, 
soaps, carbon dioxide sequestration. 


Fertilizers and animal feeds. 
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three moles of hydrogen gas, derived from 
fossil fuels, are used. 

Further processing is complex and 
expensive. For instance, generating ETA 
requires six steps: hydrogen production 
from coal or natural gas; nitrogen isolation 
from air; ammonia synthesis; ethylene pro- 
duction from crude-oil cracking; conversion 
of ethylene into ethylene oxide; and then 
conversion of ethylene oxide into ETA. 

Chitin might be a more suitable starting 
point for ETA production. With carbon, 
nitrogen and oxygen already bound in the 
polymer, only one step is needed to make 
ETA. Another five chemicals have been 
derived from chitin in a single step and the 
list is growing. So far, however, this has been 
achieved on only a small scale in the lab’. 


CHEMICAL CHALLENGES 

Extracting chemicals from waste shells with 
existing methods is destructive, wasteful and 
expensive. It requires separating out the dif- 
ferent components, a process known as frac- 
tionation. Protein is removed with sodium 
hydroxide solution and the decomposition 
of calcium carbonate uses hydrochloric 
acid — both are corrosive and hazardous 
solvents. 

To make chitosan, chitin is treated with 
40% concentrated sodium hydroxide solu- 
tion. The production of 1 kilogram of 
chitosan from shrimp shells requires more 
than 1 tonne of water. 

As a result, good quality chitin can 
cost up to $200 per kilogram, although 
the starting material is cheap. The global 
industrial use of refined chitin (in mem- 
branes, drug delivery, food and cosmetics) 
is low: around 10,000 tonnes per year’. Few 
chitin facilities exist; China, Japan, Thai- 
land and Indonesia have a few. The trans- 
formation of chitin or chitosan to other 
chemicals poses further problems. Natural 
chitin is a crystalline material that prevents 
reagents from easily accessing the polymer 
chains. Under harsh reaction conditions, 
the chains easily undergo side reactions to 
form myriad complex compounds. Sepa- 
ration of the bio-based products from the 
reactor is often laborious. 

In our view, these challenges are no greater 
than those in processing woody biomass 
into biofuels and other chemicals, which 
took two decades to move from the lab to 
commercial scales. 

Establishing a profitable, sustainable 
industry from shell waste is going to take 
creative chemistry. It needs a sustainable 
fractionation method to separate proteins, 
calcium carbonate and chitin — one that 
avoids corrosive or hazardous reagents and 
minimizes waste. 

New technologies are emerging. For 
example, teams in Mexico and the United 
Kingdom demonstrated a lactic-acid 
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fermentation process for chitin production 
in the lab and in a pilot plant in the early 
2000s*°. The process converted up to 
30-50 kilograms of shell waste in a single 
reactor. A mixture of bacteria that consumes 
proteins and decomposes calcium carbonate 
has been developed by groups in the United 
Kingdom, United States and China®*. 
Protein hydrolytes and calcium lactate are 
by-products that are useful for animal feed 
and calcium supplements. 


GET CRACKING 

Another option would be to design and use 
ionic liquids (liquid organic compounds 
with ionic functional groups) that can dis- 
solve carbohydrate polymers and extract 
chitin. Chitin polymers produced in this 
way have long chains and a high molecu- 
lar weight, and can be spun into fibres and 
films for wound dressings and water treat- 
ment, for example. 

Researchers also need to explore physical, 
solvent-free methods for shell fractionation. 
Ball milling (placing materials with metal 
balls in a spinning cylinder) may be used to 
grind the shells finely and break apart crys- 
tals. Combining chemical and mechanical 
forces might prove advantageous. For exam- 
ple, using a ball mill and an acid catalyst can 
degrade wood without heating. Combin- 
ing steam explosion (a technique that uses 
superheated steam and sudden pressure 
release) with acid is another way to liberate 
the shell’s components. 

Ball milling and steam explosion have 
been used for woody biomass refining at 
a pilot scale but few people have noticed 
the potential of these techniques for waste 
shells. (In collaboration with the Chinese 
Academy of Sciences’ Institute of Process 
Engineering in Beijing, our group at the 
National University of Singapore aims to 
have a pilot demonstration for shells run- 
ning ina few years.) 

Converting chitin into small nitrogen- 
containing chemicals — such as deriva- 
tives of ETA and of the widely used 
organic solvent furan’ — is developing fast, 
although is still in early stages. It may take 
at least five years to scale up the process 
and another ten years to commercialize 
it. Future investigations need to explore 
routes from chitin to other chemicals, 
enhance product yields through improved 
catalysis and pre-treatments and ease the 
separation of products. 

We propose that a processing pipeline be 
developed for refining waste shells, just as 
woody biomass (composed mainly of cellu- 
lose, hemicellulose and lignin) is separated 
and converted into a range of products in 
one facility’. That development took the 
cooperation of many parties, propelled by 
public concern over energy security and 
climate change. It also required financial 


support from governments and the chemical 
and fuel industry. Shell-waste biorefineries 
will create new industrial opportunities in 
southeast Asia and beyond. 

Strong support from policymakers, 
research institutes, governments, funders 
and the public is key. Fundamental research 
from scientists worldwide is urgently 
needed to overcome the technical barriers. 


SHELL REFINERY 
In the next five years, a multimillion-dollar 
project should be launched to establish the 
first processing pipeline using new tech- 
nology. The project should be supported 
by governments of nations rich in shell 
waste, and executed by researchers with 
complementary expertise, covering cataly- 
sis, materials science and engineering, food 
science and life-cycle assessment. Compa- 
nies — including producers and traders of 
shellfish, those associated with biocom- 
modities and biomaterials and others 
promoting renewable materials — should 
reassess the potential markets of an envi- 
ronmentally friendly and profitable waste- 
shell refinery and engage with research to 
commercialize emerging technologies. 

In the next decade, stringent regulations 


By 


Lobster shells harbour the nitrogen-rich compound chitin, which is used in pharmaceuticals. 


should be implemented on the disposal of 
waste shells, while providing incentives for 
companies who use them. m 


Ning Yan is professor of green chemistry 
and Xi Chen is a research fellow in the 
Department of Chemical and Biomolecular 
Engineering, National University of 
Singapore, Singapore. 

e-mail: ning. yan@nus.edu.sg 
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EMBODIED COGNITION 


A grasp on human thinking 


Elsbeth Stern weighs up two studies probing the idea of the brain as the body’s servant. 


3. 


ow has Homo sapiens uncovered 
He: laws of nature, invented tech- 

nology and established culture and 
institutions? Most scientists’ answers to 
that question have been top-heavy, refer- 
ring to language, symbolic reasoning and 
consciousness as unique human abilities on 
which comprehension, analysis, abstraction 
and reasoning are based. Since the 1950s, 
those abilities have increasingly become 
a focal point for psychological research. 
Encouraged by progress in informatics, 
researchers began to create digital models 
of the processes by which sensory input is 
selected by the brain, stored in the memory, 
connected to existing knowledge and used 
for elaboration. These ‘cognitive architec- 
tures’ were supposed to simulate and predict 
learning, reasoning, complex problem- 
solving and decision-making. 

This algorithmic focus on mental activi- 
ties ignores the fact that human beings 
engage with evolutionary pressures using 
their entire bodies — a point explored by 


Rock art in the Cave of the Hands, Argentina, dating 


to between 9,500 and 13,000 years old. 


psychologist Guy Claxton in Intelligence in 
the Flesh, and by philosopher Colin McGinn 
in Prehension. 

Intelligence in the Flesh deals with the 
unity of mind, brain and body in human 
information-processing, including higher 
cognition and academic learning. Claxton 
argues that humans would think and behave 
differently if their physiological functioning 
were different. For instance, there is research 
that shows how holding a cup of hot coffee 
or receiving other sensory input through 
the skin can influence judgement and deci- 
sion-making (L. E. Williams and J. A. Bargh 
Science 322, 6060-607; 2008), a fact entirely 
ignored in cognitive theories that confine 
themselves to visual and auditory input. 
The brain coordinates information, but it is 
the “servant, not master of the body’, notes 
Claxton. 

McGinn’s focus in Prehension is the 
human hand. He is not the first to empha- 
size that thanks to their bipedal gait, early 
humans did not need their ‘forepaws’ for 
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Intelligence in the Flesh: Why Your Mind 
Needs Your Body Much More Than it Thinks 
GUY CLAXTON 

Yale Univ. Press: 2015. 


Prehension: The Hand and the Emergence 
of Humanity 

COLIN MCGINN 

MIT Press: 2015. 


locomotion, freeing them to manipulate 
the environment with the help of tools. 
However, McGinn goes further, positing 
that the multiple opportunities provided by 
our hands shape our concepts of the mind. 
Therefore we conceive cognitive processes 
in manual terms, such as ‘grasping an idea’. 

Claxton and McGinn value higher-order 
cognition and academic learning differ- 
ently. McGinn argues that the close inter- 
action between brain and hand allowed 
humans to find their evolutionary niche 
through the discovery of physical tools, 
as well as mental ones such as language 
or mathematical symbols. He claims that 
using the hands for pointing and com- 
municating resembles ‘air writing, and 
thereby facilitated the invention of script. 
Claxton, by contrast, thinks that cognitive 
competencies based on symbolic systems 
such as writing (which he pejoratively 
labels “Cartesian education” in reference 
to philosopher René Descartes’s idea of 
mind-body dualism) are overvalued, 
whereas handicrafts and vocational edu- 
cation are undervalued. A strong focus on 
academic learning and abstract reasoning 
does not meet the needs of the majority, 
he argues — to the point that this form of 
intelligence is essentially alien to humans. 
Meanwhile, McGinn posits that it is why 
our otherwise sparsely equipped species 
has survived. 

There is a bullish flavour to their modes 
of argument that shows that Claxton and 
McGinn are aware of how controversial 
their claims are. In fact, how new and robust 
is the science in each book? Criticism of the 
shortcomings of cognitive architectures is 
no novelty. Since the 1980s, the evolutionary 
aspects of human behaviour and cognition 
have become a seminal topic throughout 
psychology. It is widely acknowledged that 
humans are challenged by the fact that we 
are adapted to the world as it existed more 
than 30,000 years ago. It is fully accepted 
that we are born endowed with percep- 
tual and behavioural programs that were 
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adaptive for our earliest ancestors and that 
still affect our behaviour, information- 
processing and emotional functioning. So 
when criticizing a Cartesian view of human 
learning, the authors are preaching to the 
converted. 

Both books are slippery in their dealings 
with state-of-the-art research. McGinn 
almost entirely ignores empirical psychol- 
ogy research and instead provides evidence 
based mostly on plausibility — for instance, 
when he claims that humans have privileged 
access to geometry because they can form 
circles and triangles with their fingers. 
There is prominent research confirming 
his emphasis on the pivotal role of hand- 
brain interaction in human cognition, 
including dozens of studies on the impor- 
tance of gesturing in learning. Psychologist 
Susan Goldin-Meadow’s Hearing Gesture 
(Harvard University Press, 2003) is one. 
McGinn also refers to the views of develop- 
mental psychologist Jean Piaget regarding 
sensorimotor activity as the foundation 
of cognition in early child development. 
Yet for more than 30 years, psychologists 
have shown that the brains of newborns 
are endowed with core knowledge that pre- 
pares them to represent information about 
objects, quantities and actions long before 
they can grasp with their hands. 

Claxton cherry-picks from psychol- 
ogy and neuroscience literature. When 
he attacks conventional school education, 
he provides anecdotal evidence about 
unhappy children, but ignores evidence- 
based attempts to improve schooling 
— for instance, by bringing everyday 
experience into the teaching of science 

. and mathematics. 
“Auman beings Claxton’s claim 


engage with that performance 
evolutionary in intelligence 
pressuresusing tests is unrelated 
their entire to factors impor- 
bodies.” tant in real life is 


not reflected in 
state-of-the-art research, such as the more 
than 100 publications based on studies of 
the Lothian Birth Cohort, headed by psy- 
chologist Ian Deary of the University of 
Edinburgh, UK. Intelligence, Deary has 
shown, is not only significantly related 
to educational and professional outcome, 
but is also a factor in positive well-being, 
health and longevity. 

Intelligence in the Flesh and Prehension are 
eloquently written, refreshing and entertain- 
ing. But Claxton and McGinn fight many 
straw men, and often fail to provide evidence 
for provocative statements. m 


Elsbeth Stern is a psychologist and professor 
of teaching and learning research at the Swiss 
Federal Institute of Technology in Zurich. 
e-mail: elsbeth.stern@ifv.gess.ethz.ch 


Books in brief 


Applied Minds: How Engineers Think 

Guru Madhavan W. W. NorTON (2015) 

Engineers are titans of real-world problem-solving, yet are strangely 
invisible, notes biomedical engineer Guru Madhavan. In this riveting 
study of how they think, he puts behind-the-scenes geniuses 

such as Margaret Hutchinson, who designed the first penicillin- 
production plant, centre stage. And, in a feat of reverse engineering, 
he shows how engineers’ methodology — rigorous analysis, testing 
and orientation towards solutions — is bedded in modular systems 
thinking, a mindset strong on visualizing structure, designing under 
constraints and weeding out weak goals in trade-offs. 


A River Runs Again: India’s Natural World in Crisis, from the 
Barren Cliffs of Rajasthan to the Farmlands of Karnataka 

Meera Subramanian PUBLICAFFAIRS (2015) 

In the middle of India’s boom, malnutrition among Indian children 
is rife. Journalist Meera Subramanian, in search of sustainable 
solutions for the subcontinent’s 1.2 billion people, criss-crossed it 
to meet scientists and citizens grappling with familiar dilemmas 
such as child marriage and polluting cooking stoves. Subramanian’s 
analysis is fresher when she takes on the inefficiencies and worse of 
‘big aid’, and her mapping of the micro solutions — such as village 
rainwater collection — suited to a country of small enterprises. 


———— Mindware: Tools for Smart Thinking 
F MINDWarRE Richard E. Nisbett FARRAR, STRAUS & GIROUX (2015) 
TOOLS How do we decide whether theories are sound or knowledge is only 
FOR conjectural? Social psychologist Richard Nisbett has drilled into 
| SMart decision-making to produce this “cognitive tool kit” of principles and 
THINKING ideas to aid the process. Inspired by the “seamless web” of science 
| HARD | — the interdisciplinary seepage of methods and facts — he draws 
Nispery . on economics, psychology and logic for a rich haul. Expect insights 
: J in areas ranging from the role of conformism in energy use to the 


differences in Eastern and Western thinking, and tools from basic 
statistics to the multipurpose heuristic KISS (keep it simple, stupid). 


Inside the Machine: Art and Invention in the Electronic Age 

Megan Prelinger W. W. NoRTON (2015) 

When electronics took off in the 1930s, US technology companies 
were suddenly forced to convey ‘invisible science’ visually. A bold 
brigade of commercial artists began to tackle the physics and 
components with creative brio — but this flowering withered in the 
1960s, when the workings of electronics had been absorbed into 
the culture. For this unusual and compelling study, cultural historian 
Megan Prelinger has gathered a trove of superb examples. Some are 
patently influenced by abstract artists such as Wassily Kandinsky, 
others by surrealism, concrete poetry and science-fiction illustration. 


Mess: One Man’s Struggle to Clean Up His House and His Act 
Barry Yourgrau W. W. NoRTON (2015) 

Nineteenth-century philosopher Ralph Waldo Emerson wrote, 
“Things are in the saddle, / And ride mankind”. A thought resonant in 
a consumerist era, it might also be seen as a comment on hoarding, 
a condition now associated with obsessive-compulsive disorder. In 

a memoir mixing sorrow and hilarity, self-confessed clutterer Barry 
Yourgrau records how he jettisoned junk and traumatic memories by 
joining Clutterers Anonymous, poking at relevant neuroscience and 
working his way towards a rapprochement with things. Barbara Kiser 
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Mary K. Gaillard’s theoretical-physics achievements include predicting the mass of the charm quark. 


She did it all 


Val Gibson enjoys the autobiography of Mary K. Gaillard, 
the first female physics professor at Berkeley. 


r | Vhe brilliant theoretical physicist Mary 
K. Gaillard has made major contribu- 
tions to the standard model of parti- 

cle physics and to superstrings, a candidate 

theory of everything. In 1981, she became 
the first woman with a tenured position in 
the physics faculty at the University of Cali- 
fornia, Berkeley. Her frank autobiography, 

A Singularly Unfeminine Profession, is an 

honest, revelatory account of her many dis- 

coveries, made as she battled gender bias and 
faced the demands of raising three children. 

Born in New Jersey in 1939, Gaillard has a 
“survival mechanism” born from an inherent 
belief in equality, nurtured by her parents and 
school, and a rebellious tendency to question 
the world around her. Having fallen in love 
with physics at school, she won a scholarship 
to Hollins College near Roanoke, Virginia. It 
included a year in Paris at Ecole Polytech- 
nique — her first exposure to the culture that 
was to become her nemesis. 

During college, Gaillard also spent two 
summers at Brookhaven National Labora- 
tory in Upton, New York, where she became 
hooked on high-energy particle physics. 
There, she met her first husband: Jean-Marc 
Gaillard, a postdoc at Columbia University 
in New York City. She did a graduate year 
at Columbia, then Jean-Marc was offered 
a post in Orsay near Paris. His colleagues 
advised Gaillard to accompany him, and to 
become “self-taught, like all great European 


physicists”. The first year in Orsay became 
“the worst year’, as Gaillard “learnt to be a 
housewife” and was largely left on her own. 
Jean-Marc was then offered a six-year staff 
position at CERN, the European centre for 
particle physics near Geneva, Switzerland. 
Here, Gaillard became a long-term visitor for 
some 20 years. Through Jean-Marc’s connec- 
tions, she secured space in a shared basement 
office in the CERN theory group; commuted 
between Orsay and CERN; and worked on 
the difference between matter and antimatter. 
She was subjected to the “determined 
antifeminism” of theory-group leader Leon 
Van Hove, who became CERN director- 
general. He would ignore her and ask male 
colleagues about a project, for instance. Jug- 
gling research and her children, Gaillard 
became a major player in theoretical particle 
physics. Without the flexibility to interact 
with colleagues, she wrote many early papers 
alone. It is easy to sympathize when she tells 
of forgetting to collect her eight-year-old son 
Bruno from a music lesson in midwinter, or 
giving him the wrong bus fare so he was fined. 
In 1973, Gaillard spent a “pivotal year” as 
a visitor at Fermilab 
near Chicago, Illinois, 
which was buzzing 
with excitement about 
the proposed theory of 
weak interactions. She 
met Benjamin Lee, 


A Singularly 
Unfeminine 
Profession: One 
Woman’s Journey 
in Physics 

MARY K. GAILLARD 
World Scientific: 2015. 
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with whom she predicted the mass of the 
charm quark, gaining a “sort of star status”. 
Back at CERN and working with giants such 
as John Ellis and Dimitri Nanopolous, she 
turned to the decay modes of the Higgs boson, 
the signature for gluons and the mass of the 
bottom quark. She co-authored the paper 
that introduced the term penguin diagram 
for a type of loop-containing Feynman dia- 
gram (a description of interactions between 
subatomic particles). The same paper earned 
Bruno, then nine, an acknowledgement for 
“help with the calculations”. 

When Gaillard headed back to the United 
States in 1981 for the tenured position at 
Berkeley, it was not with Jean-Marc but 
with her soon-to-be second husband Bruno 
Zumino, a supersymmetry theorist who died 
last year. Gaillard became a grande dame of 
particle physics, with positions on many com- 
mittees that shaped particle-physics research 
in the United States and, ultimately, the world. 

The story is as much about a thrilling 
period in particle physics as about Gaillard’s 
struggle to establish herself in a male-domi- 
nated sphere. Hers is the era of the standard 
model and its description of fundamental 
particles and forces. It has also seen the dis- 
covery of the Higgs boson, “a bug sloshing 
through molasses” as Gaillard describes it. 

Gaillard explains her contributions clearly 
and without equations; exquisite illustra- 
tions by her son Bruno are reproduced. A 
fine example is the paper, presented as a con- 
versation between herself, Lee and Jonathan 
Rosner, that used indirect experimental 
observations to predict the mass of the charm 
quark — three months before it was discov- 
ered (M. K. Gaillard et al. Rev. Modern Phys. 
47, 277-310; 1975). Now 76, Gaillard contin- 
ues to add to her broad portfolio, with a focus 
on superstrings and a desire to link theoretical 
predictions to experimental observation. 

In 1980, Gaillard produced the Report on 
Women in Scientific Careers at CERN. This 
addressed the fact that just 3% of CERN staff 
were women, and called for the elimination 
of gender discrimination through equality in 
promotion, maternity leave and provision of 
a full-day créche. Only in 1994 was a female 
experimental physicist, Fabiola Gianotti — 
who will become CERN director-general 
in 2016 — appointed to a senior position. 
Unfortunately, the current list of physicists in 
the CERN theory group shows no women in 
permanent senior positions. This is no reflec- 
tion on Gaillard. Asa colleague comments in 
the book: “She did it all!” m 


Val Gibson is an experimental particle 
physicist who has worked at CERN near 
Geneva, Switzerland. She is head of high- 
energy physics in the Cavendish Laboratory 
at the University of Cambridge, UK, and is a 
champion of equality and diversity. 

e-mail: gibson@hep.phy.cam.ac.uk 
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Partner crop plants 
with solar facilities 


About one million hectares 
of land will be required in the 
United States by 2030 to meet 
solar-energy targets (go.nature. 
com/2g5hkg). Cultivating 
carefully selected plants on such 
sites could offer a sustainable 
solution to meeting growing 
food and energy demands, 
particularly in regions with 
limited agricultural land 
and water resources (see, for 
example, go.nature.com/acixb7 
and go.nature.com/n2sysg). 

Photovoltaics (for producing 
electricity) and photosynthesis 
(for producing food, fodder or 
biofuel) both need sunlight. Large 
solar infrastructures protect 
vegetation from intense sun 
and strong winds, and regular 
washing of their surfaces provides 
water for the plants. Crops could 
be grown in the spaces between 
these structures to benefit from 
concentrated rainfall. These crops 
would reduce dust from disturbed 
soils, which could otherwise lower 
the efficiency of solar installations, 
and they would create extra 
revenue and employment. 

The benefits and trade-offs 
of such co-located systems 
are now being evaluated (see, 
for example, go.nature.com/ 
acixb7 and S. Ravi et al. Environ. 
Sci. Technol. 48, 3021-3030; 
2014). And solar operators and 
investors in North Africa, India, 
Mexico and the United States are 
already expressing an interest 
(S. R., personal communication). 
Sujith Ravi Temple University, 
Philadelphia, Pennsylvania, USA. 
sravi@temple.edu 


STEM teaching: use 
more innovations 


Two other concerns should be 
added to your prescriptions for 
improving teaching in science, 
technology, engineering and 
mathematics (STEM; see Nature 
523, 272-274 and 282-284; 2015). 
We know that smaller class 
sizes and classrooms designed 


for active learning give better 
academic outcomes (S. Cotner 
etal. J. Coll. Sci. Teach. 42, 82-88; 
2013), yet budgetary pressures 
discourage institutions from 
abandoning big lecture halls in 
favour of small classes. 

Also, there should not be 
separate faculty tracks for 
teaching and research. Teaching 
positions rarely include research 
support, so they do not offer the 
same academic opportunities as 
research faculty positions. 

These issues are ultimately 
about institutional and 
administrative buy-in. The 
success of STEM students 
depends on institutions investing 
in improved learning facilities 
and on administrators providing 
research, tenure and promotion 
opportunities for those who teach. 
Luke Holbrook Rowan University, 
Glassboro, New Jersey, USA. 
holbrook@rowan.edu 


STEM teaching: avoid 
Swiss- cheese effect 


You propose a shift from 
traditional university lectures to a 
system that teaches the methods 
of scientific enquiry to students of 
science, technology, engineering 
and mathematics (STEM; see 
Nature 523, 272-274 and 
282-284; 2015). This move 
has clear merits, but systematic 
transfer of the requisite knowledge 
should not be abandoned entirely. 
A pioneer in active-learning 
practices, Roskilde University 
in Denmark has been using 
problem-based teaching and 
successfully involving students 
in research since 1972. Our 
experience shows, however, that 
there are potential pitfalls. 
Unless critical thinking is 
allied with a strong fundamental 
knowledge base, there is a risk 
that students will develop a 
‘Swiss cheese’ understanding of 
science — with a good grasp of 
their chosen subject areas but 
major gaps in others. This can 
produce niche researchers who 
lack a proper understanding of 
their wider field. 


We therefore advise retaining 
aspects of traditional education 
in an appropriate balance, which 
is then adjusted on the basis of 
student and course evaluations. 
Farhan R. Khan, Gary T. Banta 
Roskilde University, Denmark. 
Christina Sorensen University of 
Oslo, Norway. 
frkhan@ruc.dk 


The future of public 
trust in science 


The challenges of maintaining 
trust in science (see Nature 522, 
6; 2015) can be understood in 
terms of corrupting pressures 
that make it harder for scientists 
to do the good work to which 
many aspire. 

The sheer scale of science 
today is destroying colleague 
communities; it also demands 
‘objective’ metrics of quality, 
which are perverse and 
corruptible. These effects are 
compounded by imported 
commercial pressures. The 
idealism that motivated ‘little 
science’ is no longer plausible. 

Maintaining the public's trust 
in science calls for an urgent 
evaluation of its imperfections 
and vulnerabilities. We must 
identify what needs to be 
unlearned in the prevalent 
understanding of science: for 
example, we now know that any 
science-related policy problem 
poses more questions and 
solutions than can be derived 
from the illusory precision of 
models and indicators (a factor 
in the 2008 financial crisis). 

Social-media channels are 
starting to teach the public more 
about new views of science. The 
growth of ‘DIY science’ which 
owes only minimal deference 
to established institutions, will 
eventually influence science 
education, and to good effect. In 
much the same spirit as citizen 
science has developed in parallel 
with established science, a 
movement of scientifically aware 
citizens could emerge within 
science. These citizens would 
develop an understanding of 


the connection between science’s 
internal problems, such as morale 
and quality assurance, and external 
pressures of the sort we describe. 
Jerome Ravetz University of 
Oxford, UK. 

Andrea Saltelli University of 
Bergen, Norway. 
jerome.ravetz@gmail.com 


Solar ovens beaten 
by rain and tortillas 


Solar ovens sometimes fall short 
of their promise as the gold 
standard of clean cooking, despite 
producing zero emissions (see 
L.S. Brown and W. E Lankford 
Nature 521, 284-285; 2015). 

Ina solar-oven project funded 
by the Central American Solar 
Energy Project (CASEP) in 
Nicaragua, participants generally 
reported large fuel savings. Yet 
objective measurements found 
that savings were not significant, 
and surveys indicated that 
users continued to cook on 
biomass-burning stoves. Data 
from thermometers inside solar 
ovens confirmed that oven usage 
was widely over-reported (see 
go.nature.com/oqff6a). 

Furthermore, although 
the 77 Nicaraguan women 
interviewed in 2014 (by S.V.) 
found CASEP’s empowerment 
training helpful, this did not result 
in self-sustaining community 
action after the project ended. 

In parts of Latin America, 
preparing tortillas accounts 
for more than half of cooking 
fuel usage (O. Masera et al. 
Energy Sustain. Dev. 11, 45-56; 
2007), but tortilla preparation 
is impossible with most solar- 
cooker designs because they are 
not hot enough. And in areas with 
a rainy season, solar cooking is 
impractical for half of the year. 
Improved biomass stoves and 
biogas might be more effective 
solutions in such regions. 
Gordon Bauer University of 
Oslo, Norway. 

Sarah Vukelich Williams 
College, Williamstown, 
Massachusetts, USA. 
gordon. bauer@gmail.com 
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A cure for catalyst poisoning 


Compounds that are sensitive to the components of air are difficult to use in chemical reactions, requiring conditions that 
are tedious to set up. A simple, practical solution to this problem has finally been devised. SEE LETTER P.208 


MARCUS E. FARMER & PHIL S. BARAN 


apsules or pills for drug delivery were 

invented by the French pharmacists 

Frangois Mothes and Joseph Dublanc 
in the early 1800s” as a reproducible method 
for the consistent dosing and delivery of medi- 
cines and vitamins, and to maintain the sta- 
bility of pharmaceutically active ingredients. 
This remains the most common formulation 
mode for drugs — without it, pharmacists 
would have to carefully weigh out and dispense 
freshly prepared powders of drug substances to 
patients. Yet research chemists still have to do 
this for each compound used in their reactions. 
This is especially problematic when using 
reagents and catalysts that are sensitive to 
atmospheric water vapour, oxygen or carbon 
dioxide. On page 208 of this issue, Buchwald 
and colleagues” describe an ingenious solu- 
tion to this problem by ‘formulating’ sensitive 
compounds in capsules, thereby eliminating 
the inconvenience associated with their storage 
and handling. 

Large and expensive sterilized boxes — 
known as dry boxes or glove boxes — contain- 
ing chemically inert gases are widely used to 
prevent sensitive catalysts and reagents from 
being exposed to components of the atmos- 
phere (Fig. 1). Dry boxes have enabled many 
useful discoveries for chemical synthesis in 
academic settings, but the industrial applica- 
tion of discoveries made in such conditions 
is hampered by the inconvenience associ- 
ated with assembly, maintenance and labour- 
intensive operation. Many useful and enabling 
chemical transformations thus remain under- 
used, or are completely ignored. It is surprising 
that this glaring unmet chemical need has not 
previously been addressed. 

The largest group of consumers for new 
chemical methods work in the pharmaceu- 
tical, agrochemical and materials sectors. 
Buchwald and co-workers therefore focused 
on developing a technique that would bring 
glove-box chemistry to the open bench, where 
the chemists from these industries feel most 
comfortable. To ensure the successful adoption 
of their technique, they set out to develop an 
approach that could easily transition a variety 
of synthetic methods from the glove box to the 
bench top without the need to substantially 


Figure 1 | Encapsulated reagents for dispensing air-sensitive compounds. a, Research chemists 
typically store, manipulate and perform reactions involving air-sensitive reagents in glove boxes — 
sterilized chambers that contain a chemically inert atmosphere. This is practically much more difficult 
than performing reactions on the open bench. b, Buchwald and colleagues’ report that air-sensitive 
reagents can be conveniently used and stored on the open bench when sealed in capsules made from 


paraffin wax, which melt on heating. 


alter pre-existing reaction conditions. 

The authors were initially inspired by the 
example of potassium hydride — a moisture- 
sensitive reagent that is sold as a dispersion in 
paraffin wax, allowing it to be stored and 
manipulated without using a glove box. Buch- 
wald and colleagues therefore tried to form 
dispersions of reactants using molten paraffin 
wax, but obtained random, irreproducible 
distributions of the compounds within the 
resulting mixtures. 

To circumvent this problem, the researchers 
developed a technique that places reagents and 
catalysts in a paraffin-wax capsule. The cap- 
sules can be stored in the open air for months, 
and retain full activity even when dipped in 
water. Remarkably, the researchers observed 
that 2-pyridylzinc chloride dioxanate — an 
air-sensitive reagent that degrades in minutes 
to hours if not protected — can be stored on 
the bench top for more than a year without 
signs of degradation if it is sealed in a paraffin 
capsule. Furthermore, because paraffin wax is 
generally unreactive, the capsules can simply 
be added directly to reaction mixtures using 
common laboratory procedures. The capsule 
melts on heating, releasing its contents, and the 
molten wax does not interfere with the desired 
chemical reaction. 
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To showcase the utility of this approach, 
Buchwald and co-workers used their capsules 
in synthetically useful reactions known as 
nucleophilic fluorinations of ary] triflates*” 
(see Fig. 1b of the paper’). These reactions 
require an oxygen-sensitive palladium cata- 
lyst and caesium fluoride (which is highly 
hygroscopic — that is, it rapidly absorbs 
moisture from the atmosphere). Exposure of 
these compounds to air usually sidetracks the 
reaction: oxygen causes the catalyst to be poi- 
soned (to lose its activity), whereas moisture 
causes the formation of unwanted byproducts. 
The authors therefore encapsulated a mixture 
of these compounds, added the capsules to 
reactions set up using standard laboratory 
conditions, and compared the outcomes with 
the same reactions run in a glove box*”. In 
all cases, the reactions provided comparable 
results, indicating that the encapsulated rea- 
gents had been successfully protected from air. 

To further demonstrate the feasibility of their 
approach to enable the bench-top assembly of 
palladium-catalysed fluorination reactions, 
the researchers prepared capsules containing 
a mixture of three air-sensitive compounds 
that fluorinates a variety of substrates called 
aryl and heteroaryl bromides’ (see Fig. 2 of 
the paper’). As with the previous fluorination 


procedure, the yields obtained on the bench top 
using the prepared capsules provided similar 
yields to those obtained in the glove box. 
Next, Buchwald and colleagues show- 

cased their technique for a carbon-nitrogen 
bond-forming reaction (an amination; see 
Fig. 3 of the paper’) that they had previously 
developed in their laboratory and that is now 
widely used in every branch of chemistry’. 
The reaction requires a palladium precata- 
lyst (a palladium compound that is converted 
to an active catalyst during a reaction) anda 
strong hygroscopic base. The authors encap- 
sulated the precatalyst and the base together, 
and found that the resulting mixture was 
stable when stored on a bench top for more 
than eight months. The result demonstrates 
that the compounds can coexist under these 
conditions even though the precatalyst is acti- 
vated by bases when in solution. This ‘amina- 
tion capsule’ worked as well on the bench top 
as reactions tediously prepared in a glove box’. 

As a finale, Buchwald and co-workers 
showed that the encapsulation technique 
could enable carbon-carbon bond-forming 
reactions known as Negishi cross-couplings, 
which use moisture-sensitive zinc reagents (see 
Fig. 4 of the paper’). The encapsulation of one 
such reagent, 2-pyridylzinc chloride dioxanate, 
with an appropriate precatalyst facilitated such 
reactions on the open bench in comparable 
yields to the analogous glove-box procedure’. 

Although the encapsulation approach carries 
many benefits, it will not completely eliminate 
the glove box because the capsules still need 
to be prepared in the absence of air. The wide- 
spread adoption of this approach for synthesis 
will also rely on the availability of the capsules, 
although we expect that vendors will expedite 
their commercialization and distribution. 

The thought-provoking technique opens 
up many avenues for exploring the reactivity 
of air-sensitive reagents and catalysts, both 
in academic settings and in the many areas of 
industrial science that require the rapid and 
automated preparation of libraries of structur- 
ally diverse compounds. If many catalysts and 
reagents become readily available as capsules, 
the influence of this approach will probably 
be seen in the pharmaceutical, agricultural 
and materials industries. It may not be too 
unrealistic to predict that these capsules will 
do for organic chemistry what Mothes and 
Dublanc’s pills did for medicine. = 


Marcus E. Farmer and Phil S. Baran are 

in the Department of Chemistry, Scripps 
Research Institute, La Jolla, California 92037, 
USA. 

e-mail: pbaran@scripps.edu 


1. Mothes, F. A. B. French patent 9690 (1834). 

2. Wilbert, M. 1. Am. J. Pharm. 85, 559-572 (1913). 

3. Sather, A. C., Lee, H. G., Colombe, J. R., Zhang, A. & 
Buchwald, S. L. Nature 524, 208-211 
(2015). 

4. Watson, D.A. et al. Science 325, 1661-1664 
(2009). 


5. Lee, H. G., Milner, P. J. & Buchwald, S. L. Org. Lett. 
15, 5602-5605 (2013). 

6. Lee, H. G., Milner, P. J. & Buchwald, S. L. J. Am. 
Chem. Soc. 136, 3792-3795 (2014). 

7. Ishihara, Y., Montero, A. & Baran, P. S. The Portable 
Chemist’s Consultant: A Survival Guide for Discovery, 


REGENERATIVE BIOLOGY 


NEWS & VIEWS | RESEARCH | 


Process, and Radiolabeling (Apple, 2013). 

8. Fors, B. P. & Buchwald, S. L. J. Am. Chem. Soc. 132, 
15914-15917 (2010). 

9. Colombe, J. R., Bernhardt, S., Stathakis, C., 
Buchwald, S. L. & Knochel, P. Org. Lett. 15, 
5754-5757 (2013). 


Maintaining liver mass 


A previously under- appreciated subset of liver cells has been found to contribute 
to the day-to-day maintenance of liver mass in mice. The cells are induced and 
supported by signals from an adjacent vein. SEE ARTICLE P.180 


KENNETH S. ZARET 


r | Vhe liver has remarkable regenerative 
powers. Many studies have focused on 
the ability of different types of cell to 

replenish both the liver and the bile ducts after 
damage’, but less clear is how the liver self- 
renews when cells die naturally. Such homeo- 
static renewal ensures that the liver maintains 
an appropriate mass, and so is crucial for 
health. On page 180 of this issue, Wang et al.’ 
shed light on this issue, focusing on an under- 
appreciated, self-renewing cell population in 
the undamaged livers of mice. It seems that 
liver cells themselves might function as ‘stem 
cells’ for homeostasis when they are positioned 
ina specialized zone of the liver. 

After food has been eaten, nutrients, along 
with any toxins that have been ingested, are 
absorbed by the intestine, pass into the blood- 
stream and are transported directly to the liver 


Pericentral 
hepatocyte 


Central vein 


Endothelial 


‘ Wnt signalling 


for metabolic processing. Liver cells (hepato- 
cytes) control metabolism and act as the first 
line of defence against toxins. But hepatocytes 
can be damaged in the line of duty, and chronic 
liver damage is a major health concern world- 
wide. Identifying the cell populations in the 
liver that can repair damage has therefore been 
a topic of intensive research. 

Consider the vascular plumbing of the 
liver. Nutrient-rich blood from the intestines 
travels along the portal vein, arriving in the 
portal zone of the liver, where bile ducts and 
the hepatic artery also reside. The blood then 
courses through sinuses in the liver mass, is 
exposed to hepatocytes for metabolite and 
toxin exchange, and collects in the central 
vein. Thus, the portal zone must contend with 
greater toxic insults than the central zone. 
Indeed, periportal hepatocytes respond to 
most forms of liver damage'*~° and can also 
contribute to homeostatic cell renewal'”. 


Descendent 
hepatocyte 


Figure 1 | A contributor to everyday liver regeneration. The endothelial cells lining the central vein of 
the liver emit Wnt signals that induce the expression of Wnt-responsive genes in adjacent pericentral liver 
cells (hepatocytes). Wang et al.’ report that these signals also stimulate the proliferation of pericentral 
hepatocytes. The cells give rise to descendants that reside beyond the reach of Wnt signals, and that 
replicate more slowly than their parents (some of the descendants have more than one nucleus). In this 
way, pericentral hepatocytes contribute to the maintenance of liver mass. 
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However, the liver’s anatomy indicates that 
the central zone might constitute a more- 
protected reservoir of cells, and might be a 
preferable location for cells involved in homeo- 
static self-renewal. 

Hepatocytes in the pericentral region imme- 
diately adjacent to the central vein are known 
to replicate slightly faster than other hepato- 
cytes in normal conditions’, and to be the only 
hepatocyte population that expresses genes 
activated by the Wnt signalling pathway”*. 
Wang and colleagues used genetic techniques 
in mice to indelibly label cells expressing a 
Whnt-responsive gene, such that these cells 
and their descendants fluoresced. They then 
tracked this fluorescent lineage and showed 
that pericentral hepatocytes self-renew — the 
cells remain close to the central vein and are 
not normally replaced by other hepatocytes. 
Over time, these cells give rise to descendants 
outside the pericentral zone that can replen- 
ish up to 40% of the liver’s mass under normal 
conditions (Fig. 1). These findings lead to the 
question of how pericentral hepatocytes dif- 
fer from other hepatocytes, and whether such 
differences depend on proximity to the central 
vein. Consistent with this, it is now recognized 
that a stem cell’s identity can be dependent on 
the signals that it receives from its local envi- 
ronment”. 

Mammalian cells typically carry two copies 
of each chromosome, but most hepatocytes 
carry several copies of this chromosome com- 
plement and exhibit chromosomal imbalances 
on division, making them less than ideal can- 
didates for the population that replenishes the 
liver''. Wang et al. observed that many peri- 
central hepatocytes have the normal chromo- 
some complement, and so seem better suited 
to replicating their genomes faithfully when 
they divide. Finally, the authors found that 
Wnt signals released from the endothelial cells 
that make up the central vein are required to 
maintain the proliferation of pericentral hepat- 
ocytes and thus their function in replenishing 
liver cells. 

The discovery that pericentral hepatocytes, 
along with other hepatocytes’, contribute 
to liver homeostasis opens up many avenues 
for study. For instance, the relative contribu- 
tion of each of these cell types to homeostatic 
regeneration is not known. The role of peri- 
central hepatocytes in regeneration following 
non-periportal forms of liver damage also 
remains to be determined. Could a better 
understanding of the cells enable self-renewal 
to be enhanced? Manipulating the Wnt path- 
way in vivo might provide insights along these 
lines, as has been shown for liver ‘organoids’ 
grown in vitro”. 

Perhaps the most important question is 
whether the pericentral hepatocytes behave 
as a niche-dependent stem-cell population’”’; 
that is, whether any hepatocyte placed in the 
pericentral region, under the influence of 
endothelial Wnt signalling, would become 


Wnt-responding, faster-replicating cells, 
functioning like the original pericentral cells. 
This crucial test could be carried out by ablat- 
ing pericentral cells, for example through the 
transient induction of diphtheria toxin, and 
determining whether other hepatocytes take 
on their role. If it could be shown that any 
hepatocyte — or at least any hepatocyte with 
a normal chromosome complement — when 
placed in the pericentral region or exposed 
to the correct Wnt signalling could be ‘acti- 
vated’ to become a more-efficient cell for liver 
homeostasis, this could have an impact on 
treatments for chronic liver disease. 

But almost all hepatocytes, regardless of 
their position in the liver, can self-renew and 
contribute to liver homeostasis’*. Thus, it 
may be that it is not appropriate to consider 
whether any one hepatocyte population is 
the true homeostatic stem cell. Instead, a 
more pertinent question might be whether 
some hepatocytes are better at self-renewing 
than others. 

Interestingly, the authors found that pericen- 
tral hepatocytes are the only adult hepatocyte 
population to express Tbx3, a transcription 
factor that is essential for the development of 
hepatoblasts"’, the precursors of hepatocytes 
and bile-duct cells in the early embryo. Direct 
signalling from adjacent endothelial cells pro- 
motes embryonic hepatoblast growth”. Thus, 
the pericentral hepatocytes live in an environ- 
ment that shares features with embryonic liver 
development. But hepatoblasts are bipotential, 
whereas the pericentral hepatocytes seem to 


DNA REPLICATION 


give rise to hepatocytes only, indicating differ- 
ences in the networks that regulate these cells 
types. Understanding the similarities and dif- 
ferences between the pericentral hepatocytes 
and hepatoblasts, and between pericentral 
hepatocytes and other hepatocytes in the liver, 
is sure to provide crucial insights for the liver 
and regeneration research fields. m 
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Strand separation 


unravelled 


The DNA double helix must be separated into single strands to be duplicated. 
A structure of the Mcm2-7 helicase enzyme responsible for this activity yields 
unprecedented insight into how the process is initiated. SEE ARTICLE P. 186 


MATTHEW L. BOCHMAN 
& ANTHONY SCHWACHA 


he successful replication of double- 

stranded DNA, an essential part of cell 

division, depends ona helicase enzyme 
that separates the two component strands. 
Although simple helicases have been exten- 
sively studied’, much less is known about the 
complex replicative helicases found in eukary- 
otes (the group of organisms that includes 
animals, plants and fungi). But that is about to 
change. On page 186 of this issue, Li et al.’ cap- 
italize on advances in cryo-electron micros- 
copy’ to resolve the structure of a eukaryotic 
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helicase, Mcm2-7, to a near-atomic resolution 
of 3.8 angstroms — around five times higher 
than the best Mcm2-7 structure reported so 
far*. Combined with previous studies, this 
structure indicates how a key step in DNA rep- 
lication occurs: the initial ‘melting’ of double- 
stranded DNA into single strands. 

Mcm2-7 has a central role in eukaryotic 
DNA replication. Like similar helicases from 
bacteria, archaea and viruses, it unwinds dou- 
ble-stranded DNA (dsDNA) by binding one 
strand in its central channel, excluding the 
other. Energy, provided by the enzyme’s abil- 
ity to hydrolyse ATP molecules, enables the 
complex to translocate along the bound DNA, 


a b c 
Exit channel 
nae 
ehanne DNA translocation 
14° Exit channel 


Figure 1 | Structure of a strand separator. a, The Mcm2-7 helicase is a doughnut-shaped enzyme 
composed of six different subunits (individual subunits not shown) that is vital for DNA replication’. Li 

et al.” resolve a structure that contains two copies of this hexamer, which they suggest is involved in the 
DNA melting process that converts double-stranded DNA (dsDNA) into single strands to initiate DNA 
replication. In the authors’ structure, the two hexamers are tilted at a 14° angle relative to one another. b, 
This conformation forms a narrow central channel between the two hexamers through which dsDNA can 
pass, together with two exit channels. c, dsDNA pumped through the central channel might be extruded 
through the exit channels as single-stranded DNA ‘rabbit ears, which can then be replicated. 


resulting in unwinding of the complementary 
strand’. But the doughnut-shaped Mcm2-7 is 
structurally and functionally different from 
other helicases, because it is the only known 
hexameric helicase to be derived from six 
different subunits (Mcm2 to Mcm7) instead 
of from six copies of the same subunit. This 
feature has allowed portions of the complex 
to evolve extra, specialized functions that are 
thought’ to be crucial to the enzyme’s ability 
to load onto DNA and to activate its unwind- 
ing activity — two landmark regulatory events 
during DNA replication. 

Two structures containing Mcm2-7 have 
been described previously. One represents the 
CMG complex®”, which is active during the 
phase of DNA replication known as elonga- 
tion, when complementary DNA is synthe- 
sized for each existing strand. The complex 
contains one Mcm2-7 hexamer and two other 
essential replication factors that activate the 
enzyme’s DNA unwinding ability. By contrast, 
the second structure*” is an inactive form of 
the enzyme, which has been isolated from cells 
before they replicate. This structure contains 
two Mcm2-7 hexamers in a head-to-head ori- 
entation, enclosing dsDNA in the central chan- 
nel. Helicase structures such as this Mcm2-7 
double hexamer (Mcm2-7 DH) are rare, and 
so its purpose has been a cause for debate. 

One reasonable conjecture is that the 
Mcm2-7 DH participates in DNA melting. 
Whereas DNA unwinding enlarges a pre- 
existing single-stranded DNA (ssDNA) region 
during elongation, DNA melting, which is an 
earlier process, initiates replication by locally 
transforming dsDNA into ssDNA. Local melt- 
ing provides a site for the subsequent assembly 
of a DNA replication fork — the full comple- 
ment of proteins that enable duplication of the 
genetic material’®. Although melting has been 
well studied in bacteria, little is known about 
howit occurs in eukaryotes’. Liand colleagues’ 


structure, when combined with other data, is 
highly consistent with a role for the Mcm2-7 
DHin DNA melting, for several reasons. 

First, both the current study and a previous 
one’ demonstrate that the two Mcm2-7 hexa- 
mers in the Mcm2-7 DH are offset along the 
long vertical axis of the hexamer, at a 14° tilt 
relative to one another (Fig. 1a). This offset 
restricts the dimensions of the central chan- 
nel (Fig. 1b). Although DNA is not visible in 
the authors’ structure, these data suggest that 
dsDNA will be kinked at the interface between 
the two hexamers. Sharp DNA bending is 
known to cause local DNA melting”, and may 
contribute to the unwinding of dsDNA during 
transcription”. Thus, a DNA kink between the 
two Mcm2-7 hexamers could serve to initiate 
DNA melting. 

Second, although helicases normally 
interact productively only with ssDNA, a 
specific form of the Mcm complex (Mcm467) 
has been shown to bind to and translocate 
along dsDNA”. This is consistent with a poten- 
tial role for Mcm2-7 in manipulating dsDNA 
during melting. Finally, unlike bacterial hexa- 
meric helicases, some viral replicative helicases, 
such as papillomavirus E1 and simian virus-40 
(SV40) large T-antigen, initially form dsDNA- 
containing DHs that resemble the Mcm2-7 
DH (refs 14, 15). These structures locally melt 
DNA and then uncouple into single hexamers 
to unwind DNA during elongation. 

How might dsDNA melting occur? 
Electron microscopy indicates that the SV40 
large T-antigen DH can act as a pump”, in 
which dsDNA enters each hexamer from 
flanking regions and ssDNA is extruded in 
‘rabbit ear’ structures at the interface between 
them™. Consistent with such a mechanism 
in eukaryotes, the misalignment of the two 
hexamers in the Mcm2-7 DH creates two exit 
channels at the hexamer interface through 
which rabbit ears might be extruded (Fig. Ic). 
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Thus, the Mcm2-7 DH might melt DNA ina 
manner analogous to melting on SV40 large 
T-antigen, with local unwinding of the bent 
DNA forming a highly flexible hinge to facili- 
tate ssDNA extrusion. Such a model had been 
proposed to explain Mcm2-7 DNA unwinding 
during elongation’’. Because the Mcm2-7 DH 
seems to be enzymatically inactive, further 
research will be needed to identify the factors 
required to activate the DH for melting, as well 
as to determine how the individual Mcm2-7 
hexamers physically uncouple and are remod- 
elled into the ssDNA-bound form needed for 
elongation. 

Given the technical advances in cryo-elec- 
tronmicroscopy, a flood of high-resolution 
structures should become available in the 
near future. However, such structures provide 
only a static glimpse of the target protein, a 
particularly limiting problem for the study 
of dynamic processes such as DNA replica- 
tion. Because Mcm2-7 is only one of many 
molecular motors involved in DNA replica- 
tion, understanding the dynamic nature of 
their interactions is essential for a complete 
understanding of DNA replication. To this 
end, single-molecule studies using reconsti- 
tuted eukaryotic replication systems”"® have 
begun to shed much-needed light on the 
dynamics of this process. Together, these var- 
ied experimental approaches should yield a 
holistic understanding of the vital process of 
DNA replication. = 
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PARTICLE PHYSICS 


Matter and antimatter 


scrutinized 


Asearch for differences in the charge-to-mass ratio of protons and antiprotons, 
conducted at unprecedented levels of precision, results in stringent limits to the 
validity of fundamental physical symmetries. SEE LETTER P.196 


KLAUS P. JUNGMANN 


he standard model’ of particle physics 
is considered to be the best physical 
theory that we have. It is built on sym- 
metries and can describe all the experiments 
and observations concerning the known sub- 
atomic particles. However, the model includes 
some 30 free parameters and is not fully 
explanatory. For example, it cannot explain 
a profound mystery of physics and cosmol- 
ogy’, the fact that there is no antimatter in the 
Universe. When matter and antimatter mutu- 
ally annihilated each other following the Big 
Bang, any pre-existing symmetry between 
them was broken. Matter but no antimatter 
was left behind, and we lack a satisfactory 
explanation as to how this occurred’. Research 
on the fundamental differences between parti- 
cles and antiparticles may provide an answer. 
In this vein, Ulmer et al.* (page 196) perform 
a high-precision, comparative study of the 
properties of protons and antiprotons. 
The authors used negatively charged 


a Antiprotons 


Electrode 


Cycle counter 


hydrogen atoms (which represent protons for 
technical reasons) and individual antiprotons, 
the latter generated by the antiproton decelera- 
tor facility at CERN, Europe’ particle-physics 
laboratory near Geneva, Switzerland. These 
species were stored in a sophisticated device 
known as a Penning trap, which consists of 
metal electrodes placed at defined electric 
potentials inside a strong and stable magnetic 
field (Fig. 1). In the trap, which has a diameter 
of just a few millimetres, the motion of electri- 
cally charged particles is similar to that in an 
accelerator such as the Large Hadron Collider 
at CERN, but the energies attained are 10° 
times smaller. 

A particle's cyclical motion in the Penning 
trap has a characteristic frequency (known 
as the cyclotron frequency), which is propor- 
tional to the magnetic field strength and the 
particle’s charge-to-mass ratio. Ulmer et al. 
determined the cyclotron-frequency ratio for 
the antiproton and the negative hydrogen ion, 
alternately recycling the same individual par- 
ticles at intervals of a few minutes from each 


b Negative hydrogen ions 


Figure 1 | Particle and antiparticle motion. Ulmer et al.’ used a device known as a Penning trap to 
measure, under identical conditions, the characteristic cycling frequency ofa, antiprotons (p) and b, 
negatively charged hydrogen ions (H , in lieu of protons; represented as a proton (p) and two electrons 
(e°)) undergoing circular motion in a magnetic field of strength B (grey arrows), set perpendicular to the 
direction of motion. From the cycling frequency, which is the number of cycles (N; and N,,-) that each 
particle type completed per unit of time, the charge-to-mass ratios of pairs of individual antiprotons and 
negatively charged hydrogen ions were determined. The number of cycles was measured from signals 
registered by the trap’s electrodes. After correcting for the difference (AN) between N; and N,,- to take 
into account the binding energies and the masses of the two electrons in H” that render it different from a 
proton, the authors found that the charge-to-mass ratios of protons and antiprotons are identical with an 


accuracy of 69 parts per trillion. 
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other in the same experiment. The authors 
repeated this procedure 6,500 times within 
35 days and scrutinized the results for system- 
atic errors. Finally, they found that the charge- 
to-mass ratios of protons and antiprotons are 
equal to within 69 parts per trillion. 

This result is four times more accurate than 
previous measurements’ of these ratios, and 
has implications for the validity of fundamental 
physical symmetries and theories that have been 
proposed to address unexplained aspects of the 
standard model. Symmetries have a central 
role in physics. A symmetry that holds across 
the Universe is an indication that a conservation 
law is at work. For example, adjusting a clock by 
an arbitrary time interval leaves all physical pro- 
cesses completely unaffected. A consequence 
of this is that energy can neither be created nor 
destroyed. But, when asymmetry is violated or a 
quantity is not conserved, a symmetry-breaking 
process must be at work. 

In the process known as nuclear B-decay, 
for instance, a neutron is transformed into a 
proton, an electron and an antineutrino, but 
only antineutrinos of ‘right-handed’ nature 
appear. As a consequence, the electron is emit- 
ted into a preferred direction with respect to 
the neutron spin. This asymmetry is an exam- 
ple of parity (P) violation’, which means that 
B-decay would not proceed in exactly the same 
way in a mirrored version of the world. Simi- 
lar symmetry violations are observed only in 
some processes that involve the weak force. 
They can appear if the signs of electric charges 
are reversed (charge conjugation, C), or if the 
arrow of time changes direction (time rever- 
sal, T). Symmetry violations also occur when 
the combination of C and P symmetries (CP 
symmetry) breaks down; these become evident 
for physical processes that occur differently 
when the signs of charges and handedness are 
changed simultaneously. 

The physicist Andrei Sakharov offered’ an 
explanation for the observed dominance of 
matter, based on such a CP-symmetry viola- 
tion. However, all the known CP-violating 
processes cannot sufficiently explain the 
preponderance of matter over antimatter. 
Furthermore, at current levels of precision, no 
physical process has been found to violate the 
combination of C, P and T symmetries (CPT 
symmetry), which relates to fundamental 
physical principles. In quantum mechanics, 
for example, this combined symmetry ensures 
that particle spins take only integer and half- 
integer values. Moreover, the invariance of 
physical laws in different moving frames of 
reference (known as the Lorentz invariance) 
implies CPT symmetry*”. 

Physicist Alan Kostelecky and colleagues 
have suggested that a violation of this 
symmetry might provide an alternative expla- 
nation for the missing antimatter’’. Unlike 
Sakharov’s model, which requires the disap- 
pearance of antimatter in the early, thermally 
unstable Universe, the latter model does not 


have this additional stringent condition. Under 
CPT symmetry, particles and antiparticles are 
strictly identical except for the sign of their 
charge. Ulmer and colleagues’ measurements 
of the proton and antiproton charge-to-mass- 
ratios place limits on the differences between 
the properties of particles and antiparticles 
and establish a tighter boundary on a possible 
CPT-symmetry violation. 

The charge-to-mass ratios measured by the 
authors do not vary by more than 720 parts 
per trillion during a sidereal day, which is the 
duration of a day with respect to the fixed posi- 
tions of stars rather than to the Sun. Therefore, 
this level of accuracy excludes a violation of the 
CPT symmetry or of the related Lorentz invar- 
iance that could be attributed to a preferred 
frame of reference, such as the one provided 
by the cosmological microwave background 
(the Big Bang’s relic radiation). It should also 
be noted that because the cyclotron frequency 
measurements took place in Earth’s gravi- 
tational field, any difference in the way that 
protons and antiprotons interact with gravity 
would modify their respective frequencies”. 
However, the authors found no such difference 
larger than 870 parts per billion. This means 
that the weak equivalence principle — which 
states that all bodies in a given gravitational 
field undergo the same acceleration indepen- 
dently of their properties — holds at this level 
of accuracy. 

Ulmer and colleagues’ experiment has 
improved our understanding of fundamental 
physical principles by placing important lim- 
its on several processes. This experiment is a 
highlight of research on the central question 
of the prevailing matter-antimatter asym- 
metry, which the researchers approach by 
a promising route. Apart from the authors’ 
tests of the CPT-symmetry invariance, there 
are other experiments” that have searched 
for violations of the CP and T symmetries. 
The search for the former typically involved 
precise measurements of particle properties, 
including antiprotonic systems. The hunt for 
the latter included searches for the elusive 
permanent electric dipole moments of parti- 
cles, and research on the correlations in the 
parameters of B-decaying nuclei and their 
decay products, such as neutrinos, electrons 
and daughter nuclei. 

Highly precise experiments at low energies, 
such as this, are complementary to searches for 
evidence of fundamental symmetry violations 
in high-energy particle colliders. There is still 
no indication whether CPT- or CP-symmetry 
violations may be responsible for the matter- 
antimatter asymmetry and for any possible, but 
as yet unknown, differences between particles 
and antiparticles. Scientists therefore look for- 
ward to improved results from ongoing, well- 
motivated precision experiments’, involving 
antiprotons in particular’, which sustain the 
attack on one of the most intriguing questions 
in physics. m 
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It takes two to untangle 


Yeast require the enzyme Hsp104 to untangle protein aggregates, which arise in 
stressed or aged cells. Animals lack Hsp104, but it emerges that proteins of the 
DNAJ family of molecular chaperones can fulfil this role. SEE LETTER P.251 


HARM H. KAMPINGA 


he intracellular protein quality- 

control network ensures that proteins 

fold properly or are soon degraded 
when damaged or no longer needed’. When 
the quality control fails, proteins can clump 
together in aggregates — a phenomenon 
associated with stress and ageing, and with 
many neurodegenerative diseases, several 
cardiac- and skeletal-muscle diseases and 
diabetes type I] (ref. 2). In yeast, a molecu- 
lar chaperone called heat-shock protein 104 
(Hsp104) mediates disaggregation’, thus 
maintaining cellular health. But although 
several observational studies have suggested 
that animals have the potential to disaggre- 
gate proteins’, they lacka functional Hsp 104 
equivalent. Furthermore, in vitro disag- 
gregation using human molecular chaper- 
ones has proved inefficient’. On page 251 
of this issue, Nillegoda et al.° show that pro- 
tein disaggregation in animals is mediated 
by synergistic cooperation between differ- 
ent members of another class of molecular 
chaperone, the DNAJ proteins. 

In both yeast and animals, Hsps prevent the 
formation of aggregates by binding to hydro- 
phobic stretches of amino acids. Small Hsps, 
together with DNAJ proteins, capture unfolded 
or misfolded proteins and maintain them ina 
soluble state. These captured clients can then 
be transferred to proteins of the Hsp70 fam- 
ily, which mediate refolding or degradation, 
thus preventing aggregation’. DNAJ proteins 
— the largest group of molecular chaperones, 
with 22 members in yeast and more than 50 in 
humans — are thought to play their part in 
this process by directing Hsp70 to specific 
clients’. In addition, individual DNAJ family 
members assist Hsp70 in Hsp104-dependent 


protein disaggregation in yeast. However, the 
role of DNAJ proteins in the solubilization of 
aggregates in animals has been enigmatic. 

DNAJ proteins are divided into A, B and C 
classes, of which DNAJA and DNAJB in par- 
ticular have been implicated in protein qual- 
ity control after stress’. The two classes are 
thought to interact with the Hsp70 machine 
separately from each other, chaperoning dif- 
ferent types of client’. But if, and how, a com- 
bination of proteins of different DNAJ classes 
might act in tandem had not previously been 
addressed. 

Using preformed, heat-aggregated model 
proteins, Nillegoda et al. show that DNAJAs 
and DNAJBs accelerate protein disaggrega- 
tion synergistically through a mechanism that 
is distinct from their classical role in protein 
folding. Ina series of experiments, the authors 
demonstrated that, rather than acting sequen- 
tially, the different DNAJ classes act in parallel 
with one another, and together with Hsp70, to 
mediate disaggregation. 

DNAJ proteins are known’ to interact 
with Hsp70 through their evolutionarily 
conserved J-domains, and with their targets 
through variable carboxy-terminal domains 
(CTDs). Intriguingly, the authors report that 
the synergistic relationship between DNAJA 
and DNAJB during disaggregation depends 
on interactions between the J-domain of one 
protein and the CTD of the other (Fig. 1). This 
interaction is independent of the motif called 
HPD, through which DNAJ interacts with 
Hsp70, but is instead mediated by conserved, 
differently charged regions in the J-domain 
and CTD in each protein. Most DNAJ proteins 
studied so far act as homodimers (pairs of the 
same protein)’, and Nillegoda and colleagues 
propose that, in the disaggregation com- 
plex, a DNAJA homodimer binds a DNAJB 
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Figure | | Protein disaggregation by DNAJ proteins. Nillegoda et al.° investigated the mechanism 
underlying protein disaggregation in animals. They report that two classes of DNAJ protein, DNAJA and 
DNAJB, acting as homodimers, use their carboxy-terminal domains (CTDs) to bind to aggregates. One 
molecule of the DNAJA homodimer binds to CTD1 of the adjacent DNAJB through a J-domain, and 

vice versa. The other subunit of each homodimer uses its J-domain to bind to one Hsp70 molecule, recruiting 
Hsp70 to bind to the aggregate. Entropic pulling forces, driven by Hsp70 binding to and hydrolysing ATP 
molecules, untangle a polypeptide molecule from the aggregate. The released polypeptide can then be folded 
into a normal protein. The DNAJ-Hsp70 complex breaks apart, and the process begins again. 


homodimer. One protein from each homodi- 
mer engages in this interaction, and the other 
is free to interact with an Hsp70 molecule. 
Thus, two Hsp70 molecules can be recruited to 
each complex. 

The authors suggest that the hydrolysis 
of ATP molecules by Hsp70 brings about 
entropic pulling, a process that untangles poly- 
peptide molecules from the aggregate, allow- 
ing them to be refolded or degraded (Fig. 1). 
Finally, Nillegoda et al. demonstrate that the 
complex works most efficiently when the pre- 
formed aggregates have been generated in the 
presence of small Hsps. These proteins are 
excellent ATP-independent chaperones, and 
can join strings of protein monomers under 
conditions of acute stress, incorporating these 
unfolded proteins into an aggregate — a pro- 
cess that makes the aggregates themselves 
more accessible for disaggregation reactions’. 

A basic question that remains to be resolved 
is why animals lost Hsp104 and evolved to use 
DNAJs for disaggregation instead. One specu- 
lation is that Hsp104 is rather promiscuous — 
it will efficiently break down any aggregated 
protein complex. In animal cells, this would 
include complexes that are necessary for cellu- 
lar function, such as RNA granules, which are 
assembled by controlled aggregation and are 
needed to transport RNA along neurons’. By 
contrast, the complex discovered in this study 
seems to allow controlled, substrate-specific 
disaggregation. 

The existence of an animal disaggregating 
complex provides an explanation for pre- 
viously observed protein-disaggregation 
activity in mammals’. Furthermore, the com- 
plex acts in a similar way to the specialized 
DNAJ-Hsp70 combinations that disassemble 
normal, non-pathological protein complexes 
(for example, interactions between Hsp70 
and the DNAJ auxilin mediate the removal 
of clathrin protein from certain vesicles”’). 
Given the large number of DNAJ proteins 


in animals, the authors’ findings suggest 
that many combinations of DNAJs might 
interact, to make complexes that untangle 
specific clients. 

Nillegoda and colleagues’ data are restricted 
to tests on heat-induced protein aggregates, 
but it will be important to examine if and how 
DNAJ-Hsp70 complexes act on the various 
types of aggregate that cause human degenera- 
tive disorders’. In such diseases, only chaper- 
ones that inhibit the formation of aggregates 
have been described so far’’. Inhibition of 
aggregation might delay the onset of disease, 
but would require a continual treatment that 


LONGEVITY 


begins before the disease arises — an unfea- 
sible strategy for diseases of spontaneous or 
unknown origin. By contrast, targeting dis- 
aggregation activity after the first symptoms 
of disease become apparent could be of thera- 
peutic value. It will be necessary to determine 
the stage in vivo at which the DNAJ-Hsp70 
complex acts to untangle aggregates brought 
about by different disease-associated proteins; 
to define the different DNAJ combinations 
involved; and to test whether the complex can 
alleviate aggregation-associated toxicity and 
thus halt disease progression. m 
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Mapping the path 
to a longer life 


Inhibiting the PI3K branch of the cell signalling induced by insulin and insulin- 
like growth factor can extend lifespan. The finding that inhibiting the RAS 
branch also extends lifespan in flies suggests a new target for anti-ageing drugs. 


MORRIS F. WHITE 


nderstanding how cellular nutrient- 

| sensing and homeostasis affect an 
organisms lifespan and susceptibility 

to cancer and degenerative diseases is clini- 
cally important but scientifically difficult. 
Work in model organisms, first in nematode 
worms’ and then in fruit flies’, has established 
that increasing the activity of the transcription 
factor FOXO, by reducing insulin and insulin- 
like growth factor signalling (IIS), can protect 
against cellular damage and ageing. Writing 
in Cell, Slack et al.’ now show that selective 


170 | NATURE | VOL 524 | 13 AUGUST 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


mutation of the fruit fly IIS adaptor protein 
Chico to disrupt either of the two major IIS 
pathways — the RAS-ERK or the PI3K-AKT 
cascades — can extend the flies lifespan. They 
find that inhibiting ERK signalling activates 
the transcriptional-repressor protein AOP, 
which, surprisingly, extends lifespan by just as 
much as the activation of FOXO that occurs 
during PI3K inhibition. This finding may have 
uncovered a new target for extending lifespan 
that lacks the adverse effects associated with 
PI3K inhibition, which include dysregulated 
metabolism, reduced growth and infertility. 
From a clinical perspective, a reduced IIS 


pathway that extends lifespan is different from 
the resistance to the hormone insulin that 
leads to metabolic syndrome — the medi- 
cal term for a combination of diabetes, high 
blood pressure and obesity. The most frequent 
misconception is that increasing the amount 
of circulating insulin to overcome insulin 
resistance is a healthy goal. Although increas- 
ing circulating insulin levels can prevent high 
blood glucose levels and slow the progression 
to type 2 diabetes, it is associated with obesity 
and abnormal lipid levels and cardiovascular 
disease. Moreover, at the cellular level, chronic 
high insulin or long-term insulin therapy can 
retard tissue repair and maintenance by inhib- 
iting autophagy, the process that removes 
damaged intracellular proteins. How to safely 
achieve a longer lifespan by reducing IIS in 
the face of high-sugar diets that promote insu- 
lin resistance and increase insulin levels is a 
challenging conundrum. 

The fruit fly Drosophila melanogaster is 
useful for investigating the relationship 
between the IIS pathway and lifespan because 
it has a short life cycle and its genome encodes 
single copies of many IIS components. How- 
ever, translating discoveries in fruit flies to 
other organisms can be tricky, because in 
people and other animals the IIS pathway is 
distributed between two homologous recep- 
tors (InsR and IGF1R), three or four adaptor 
proteins (IRS1, IRS2, IRS3 in rodents, and 
IRS4) and several effector proteins of the 
MAPK, PI3K, ATK and FOXO families. Some 
variants in the genes encoding IGF1R* and 
FOXO3A° have been associated with human 
longevity; however, complete loss of InsR or 
IGF1R function is fatal after birth for mice and 
people. Yet inactivation of InsR in murine adi- 
pose tissue extends lifespan®, as does deletion 
of IGF1R in the brain’. Thus the window for 
increasing lifespan by modulating HS in 
mammals seems to be tissue specific. 

The natural loss of growth-hormone 
receptors in humans with Laron syndrome 
(a type of dwarfism) causes obesity and 
reduces circulating insulin and IGF1, as 
well as decreasing the incidence of diabetes 
or cancer®. Notably, mice without growth- 
hormone receptors share similar traits and 
are the longest-lived laboratory mouse 
strain. Clearly, it is essential to understand 
the tissue-specific effects of reduced HS in 
mammals to modulate lifespan with the 
fewest possible adverse effects. 

A major problem with reduced IIS is the 
risk of dysregulated metabolism and growth 
associated with inhibiting the PI3K cascade’. 
Despite this, efforts to target the PI3K branch 
of IIS in ageing might be successful with a bet- 
ter understanding of which protein isoforms 
to target and with improved inhibitors. In the 
meantime, Slack et al. find that exposing flies 
to the small molecule trametinib, currently 
used for cancer therapy, achieves similar lifes- 
pan extension to the inhibition of the PI3K 
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Figure 1 | IIS and lifespan. Insulin and insulin- 
like growth factor signalling (IIS) is triggered 

by binding of the insulin receptor (InsR) or 
insulin-like growth factor 1 receptor (IGF1R) 

and activation of an adaptor protein, which can 
be one of three or four IRS proteins in mammals 
or the Chico protein in Drosophila fruit flies. The 
pathway then splits into two main branches. One 
activates the proteins PI3K and AKT to cause 
inhibition of the transcription factor FOXO, an 
activator of gene transcription. The other branch 
activates the protein RAS, which then triggers a 
cascade involving RAF, MEK and ERK to inhibit 
the transcription factor ET V6 (or its equivalent in 
flies, AOP), a repressor of gene transcription. Slack 
et al.’ show that, in Drosophila, mutating chico to 
disrupt either of these pathways, or inhibiting the 
RAS-ERK pathway by treating flies with the MEK 
inhibitor trametinib, leads to extended lifespan. 
Adapted from ref. 3 (CC BY 4.0). 


pathway. Trametinib inhibits the ERK branch 
of the IIS pathway by inhibiting the protein 
kinase enzyme MEK (Fig. 1). 

The authors also show that the extension of 
fly lifespan by ERK or PI3K inhibition is not 
additive, which suggests that the two branches 
of the pathway might converge on modulating 
the expression of common genes that regulate 
lifespan”. Inhibition of ERK activates AOP, 
whereas inhibition of PI3K activates FOXO; 
both transcription factors do indeed bind a 
common subset of genes, but the exact targets 
that control lifespan are unknown”? (Fig. 1). 
Moreover, FOXO is usually a transcriptional 
activator, whereas AOP is a repressor that 
opposes the activity of another factor, PNT, in 
Drosophila. Interestingly, coactivation of FOXO 
and PNT can have detrimental effects that are 
attenuated by AOP”, indicating that favourable 
crosstalk between AOP and FOXO might mod- 
ulate common genes needed to extend lifespan. 

Despite the potential to bypass PI3K 
inhibition, it remains to be investigated 
whether inhibiting the ERK cascade can extend 
mammalian lifespan without any adverse 
effects. ERK is amember of the MAPK enzyme 
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family, which mediates cellular responses to a 
wide range of extracellular cues to regulate cell 
growth, differentiation and survival. Although 
MEK inhibition has been shown"' to improve 
glucose tolerance in diet-induced obese mice, 
the tissues in which this is beneficial are ill 
defined. Further work is needed to establish 
whether inhibiting the ERK signalling branch 
is a plausible mechanism-based strategy 
for extending lifespan, especially when started 
in adults. 

While work continues to devise medical 
strategies to extend lifespan, calorie restriction 
remains the best-known way to increase life- 
span in yeast, nematodes, fruit flies, rodents 
and some primates”. Calorie restriction can 
reduce the progression of age-related dis- 
eases, including obesity, insulin resistance, 
type 2 diabetes, cardiovascular disease and 
cancer, but it is difficult for people to use 
in the long term and can be dangerous if 
unmonitored or used to excess; furthermore, 
its beneficial effects on human lifespan are 
unproven”’. Recent work suggests that a 
fasting-mimicking diet that produces inter- 
mittent, brief bouts of calorie restriction can 
produce health benefits in people and extend 
the lifespan of mice”. 

Both these approaches increase insulin 
sensitivity and reduce circulating insulin and 
IGF1 concentrations, and so reduced IIS might 
be involved in the observed effects. Whether 
medical strategies — such as using the drug 
rapamycin to inhibit the enzyme TOR", the 
antidiabetic drug acarbose to mimic calorie 
restriction”, or inhibiting the ERK cascade 
as described by Slack and colleagues — can 
exploit or augment the underlying molecular 
mechanisms engaged by calorie restriction and 
reduced IIS in the face of nutrient excess is an 
important area for future investigation. m 
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Universal allosteric mechanism for Ga 


activation by GPCRs 


Tilman Flock!, Charles N. J. Ravarani!*, Dawei Sun?**, A. J. Venkatakrishnan'+, Melis Kayikci', Christopher G. Tate’, 


Dmitry B. Veprintsev?? & M. Madan Babu! 


G protein-coupled receptors (GPCRs) allosterically activate heterotrimeric G proteins and trigger GDP release. Given 
that there are ~800 human GPCRs and 16 different Ga genes, this raises the question of whether a universal allosteric 
mechanism governs Ga activation. Here we show that different GPCRs interact with and activate Ga proteins through a 
highly conserved mechanism. Comparison of Ga with the small G protein Ras reveals how the evolution of short 
segments that undergo disorder-to-order transitions can decouple regions important for allosteric activation from 
receptor binding specificity. This might explain how the GPCR-Ga system diversified rapidly, while conserving the 


allosteric activation mechanism. 


proteins bind guanine nucleotides and act as molecular switches 
e€ in a number of signalling pathways by interconverting between 

a GDP-bound inactive and a GTP-bound active state’’. They 
consist of two major classes: monomeric small G proteins’ and hetero- 
trimeric G proteins*. While small G proteins and the o-subunit (Ga) of 
heterotrimeric G proteins both contain a GTPase domain (G-domain), 
Gz contains an additional helical domain (H-domain) and also forms a 
complex with the GB and Gy subunits. Although they undergo a similar 
signalling cycle (Fig. 1), their activation differs in one important aspect. 
The guanine nucleotide exchange factors (GEFs) of small G proteins are 
largely cytosolic proteins, whereas the GEFs of Gz proteins are usually 
membrane-bound GPCRs. While GEFs of small G proteins interact 
directly with the GDP binding region'’, GPCRs bind to Gz at a site 


Ee 


GAP-bound state Inactive state 


GTPase domain 
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40 structures (H-domain) 11 structures 
GDP. 
ee @ 
Active state 
25 structures 
eco ® 
GTP 1 structure 


Figure 1 | Ga signalling states and activation. Heterotrimeric G proteins (1) 
release GDP upon binding to a guanine nucleotide exchange factor (GEF), 
which are G-protein-coupled receptors, (2) bind GTP and recruit downstream 
effectors, and (3) hydrolyse GTP, promoted by a GTPase activating protein 
(GAP), leading to (4) the inactive, GDP-bound state. Structures of the Ga 
subunit (blue) bound to GDP (Protein Data Bank (PDB) accession 1 got; 
inactive state; top) and bound to the B.-AR (grey) (PDB 3sn6; active state; 
bottom) are shown. 


almost 30 A away from the GDP binding region’ and allosterically trig- 
ger GDP release to activate them. 

The high-resolution structure of the Gx,-bound B,-adrenergic recep- 
tor (B2AR)° provided crucial insights into the receptor—G protein inter- 
face and conformational changes in Ga upon receptor binding®”. Recent 
studies described dynamic regions in Ga,” and Ga;’, the importance of 
displacement of helix 5 (H5) of Ga, and Ga, by up to 6 A into the 
receptor’, the extent of helical domain opening during GDP release”””, 
and identified residues that contribute to Ga; activation”. These stud- 
ies focused on single, specific Gx proteins; however, in humans there are 
16 different Gx genes, with at least 21 isoforms‘ that can be grouped into 
four functional subfamilies (Ga,, Ga, Gag, G12), which each regulate 
different signalling pathways". Although they belong to the same protein 
fold, they have diverged significantly in their sequence such that each Ga 
protein can be specifically activated by one or several of the ~800 human 
GPCRs*. Thus, a fundamental question is whether there is a universal 
mechanism of allosteric activation that is conserved across all Ga protein 
types’®. Allosteric communication in proteins is mediated through con- 
formational changes, which are facilitated by the re-organization of non- 
covalent contacts between residues. Thus, studying these contacts can 
provide detailed insights into the mechanism of allostery'*"’°. On the basis 
of a comprehensive analysis, here we propose that GPCRs interact and 
activate Ga subunits through a conserved mechanism. We describe 
molecular details of the key structural transitions and pinpoint residues 
that constitute the “common core’ of Ga activation. 


Common Ga numbering and residue contact networks 


We created a structural and sequence alignment of 80 Ga structures 
from diverse organisms and 973 sequences from 66 species that have a 
GPCR-G protein system (auto-activating plant Ga proteins were not 
considered; Methods). To enable the comparison of any residue/posi- 
tion between different Ga proteins, we devised the common Ga num- 
bering (CGN) system (Fig. 2a). The CGN provides an ‘address’ for every 
residue in the DSP format, referring to: (1) the domain (D); (2) the 
consensus secondary structure (S); and (3) the position (P) within the 
secondary structure element. For instance, phenylalanine 336 in Ga, is 
denoted as Phe336%"** as it is the eighth residue within the consensus 
helix H5 of the G-domain. The corresponding position in Ga, is 
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Figure 2 | The common Ga numbering (CGN) system and Ga conserved 
contact networks. a, Every position in a Ga is denoted by its domain (D), 
the consensus secondary structure element where it is present (S), and position 
(P) within the consensus SSE. Names of SSEs are shown in the cartoon; loops 
are named with lowercase letters of the flanking SSEs (for example, h1ha; 

see Extended Data Fig. 1 for all SSEs). An alignment of all 973 Ga proteins 


Phe376%"*. Loops are labelled in lowercase letters of their flanking 
secondary structure elements (SSE); for example, s6h5 refers to the loop 
connecting strand S6 with helix H5 (see Extended Data Fig. 1, Methods 
and Supplementary Note). A CGN mapping webserver is available at 
http://www.mrc-lmb.cam.ac.uk/CGN. 

The Ge structures were assigned to the four major signalling states 
(Fig. 1) and the non-covalent contacts between residues were calculated 
for each structure. We computed the consensus non-covalent residue 
contacts from all Ga structures of the same signalling state. Using the 
CGN, we integrated information on evolutionary conservation for every 
position and derived the consensus contacts mediated by universally 
conserved residues for each signalling state (Fig. 2b). We find that each 
step of the signalling cycle undergoes contact re-organization to variable 
extents. Since these conformational changes involve conserved residues, 
the observed contact re-organization is likely to be universal for all Ga 
proteins. A description and additional interpretations are provided in 
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to the human reference alignment 


For example, inactive state consensus network 
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belonging to all the 4 subfamilies from 66 species allowed the identification of 
equivalent residues. An online webserver allows mapping of any Ga sequence 
or structure to the CGN system (http://www.mrc-lmb.cam.ac.uk/CGN). 

b, Computation of consensus residue contact networks between conserved 
residues for different Ga signalling states. See Methods, Supplementary 

Note and Supplementary Data. 


the Supplementary Note and Supplementary Information. Below, we 
describe the major findings pertaining to Ga activation. We first focus 
on the GPCR-Gz interface and then describe the molecular details of 
how the non-covalent contacts are re-organized and propagated to the 
GDP binding pocket, leading to GDP release. 


GPCR-Ga protein interface 

Analysis of the buried surface area (BSA) and residue contacts between 
the B,AR-Ga, interface’ shows that H5 contributes ~70% (845 A’; 
15 residues) of the total BSA. Other SSEs (s2s3, h4hg, H4, h4s6, S6) 
cover ~20% (289 A’; 14 residues), and the amino-terminal membrane- 
anchored helix HN and its loop with strand S1 contribute ~10% 
(120 A; 5 residues) of the total BSA (Fig. 3a). H5 is the key interface 
element that contacts residues in transmembrane helices (TM3, TM5 
and TM6) and intracellular loops 2 and 3 (ICL2 and ICL3) of the B. AR’. 
Contacts from the other Ga regions are mainly restricted to ICL3 of the 
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Figure 3 | Helix 5 contains the conserved interface region and is comprised 
of two modules. a, Inter Gz—GPCR residue contact network (inset) and buried 
surface area (BSA) analysis of the heterotrimeric Ga,—B2AR structure (3sn6). 
The line width between the nodes (SSE) denotes the number of consensus 
residue contacts. GPCR positions are denoted by extending the Ballesteros— 
Weinstein numbering system (note parts of ICL3 become extended TMS in the 
active state of the receptor). b, Scatter-plot of Ga sequence conservation and 


receptor. An analysis of the contacts of the Ga, carboxy-terminal pep- 
tides bound to rhodopsin'”*° shows that conserved residues in H5 tend 
to interact with the corresponding, topologically equivalent residues in 
rhodopsin (Supplementary Data). 

We mapped evolutionary conservation onto the i,AR-Gz, interface 
and found that H5 is the only interface region that harbours residues 
that are highly conserved across species and Gz protein types (~27% of 
H5; 7 residues; Fig. 3b). These residues have significantly higher BSA 
compared to the non-conserved H5 interface residues. Several highly 
conserved Gz interface residues interact with conserved interface resi- 
dues on the GPCR (Extended Data Fig. 2a). Computational energy 
calculations show that these residues make the highest interface energy 
contribution, suggesting that they are important for complex formation 
(Extended Data Fig. 2a). Thus, the universally conserved residues on H5 
might form the conserved ‘interaction hotspots’ for different Ga pro- 
teins to interact with their cognate receptors in a similar binding mode. 
Wealso found that two-thirds of the H5 residues are variable but half of 
these (8 residues) still contact the receptor. Thus, H5 harbours distinct 
sets of interface residues that are either conserved or variable across the 
different Ge proteins. The variable positions on H5 together with the 
other interface regions (Fig. 3b) could be important for selective coup- 
ling to different receptors, as shown for individual Ga proteins”. This 
suggests that the conserved Gz interface positions provide the basis for a 
common mode of receptor binding, while the variable positions might 
confer selectivity in receptor coupling (Supplementary Note). 
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normalized BSA highlights the conserved and variable interface residues. 

c, Consensus contact rewiring between the inactive and the GPCR-bound state 
by H5 residues. Positions mediating intra Gx protein contacts (blue) and 
receptor-mediated contacts (red) are shown. The circle size represents the 
number of contacts. H5 can be divided into transmission and interface module. 
The disorder-to-order transition of the H5 C-terminal region upon receptor 
binding and SSE contact rewiring are shown. 


The role of helix H5 in Ga activation 


In the 79 structures of Gx not bound to a GPCR, the C-terminal residues 
of H5 are characterized by missing electron density (Extended Data 
Fig. 2b). This region undergoes a disorder-to-order transition and 
extends H5 upon receptor binding as shown for Gagj)/;°°*?°”?. 
Analysis of H5 from 561 full-length Gx homologues suggests that the 
higher disorder propensity of the last eight residues compared to the rest 
of H5 is a universal feature (Extended Data Fig. 2b). Within this dis- 
ordered region, hydrophobic positions The? Leu®>?° and Leu'!®”> 
contact the GPCR, fold into a helix and are conserved between human 
proteins and the yeast homologue Gpal (~1,200 million years (Myr) 
ago). This highly conserved peptide motif” is involved in the disorder- 
to-order transition upon receptor binding and suggests that this struc- 
tural transition mediated by the three key conserved interface residues is 
likely to be a universal feature of all G proteins. 

To understand the effect of GPCR binding on contact re-organization 
within Ga, we analysed the consensus contacts in the inactive state (11 
structures) and the active state (B.AR-Gz, structure). Although we had 
only one structure of the receptor-Ga protein complex, identifying the 
differences between the contacts of the active state and the consensus 
contacts of the inactive state allowed us to focus on residues that are 
re-organized upon receptor binding (and hence likely to be relevant for 
all Ga types; Supplementary Note). In all inactive state structures, the 
N-terminal part of H5 makes extensive contacts with a number of 
SSEs within the G-domain of Ga. Two universally conserved positions 
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(Phe"!>8 and Val"*”) on H5 contact conserved positions in H1, $2 and 
S3 (left lobe), and S5 and S6 (right lobe), respectively (Extended Data 
Fig. 3). In the GPCR-bound state, H5 loses 20% of its intra-Ga contacts 
(primarily with H1), and gains 27 intermolecular residue contacts with 
the GPCR (Fig. 3c). Upon receptor binding, the H5 contacts with the 
right lobe are not lost, but are re-organized to accommodate the struc- 
tural changes and might be important for the stability of the receptor- 
bound complex (for discussion, see Supplementary Note and ref. 25). 
Thus, H5 is composed of two highly conserved modules with distinct 
functions; that is, an interface module important for receptor binding, 
and a transmission module that harbours intra-G-protein contacts, 
which are re-organized upon receptor binding (Fig. 3c). 

In contrast to H5, the non-conserved interface regions from H4, h4s6 
and h4hg undergo less marked re-organization of intra-Ga contacts 
upon receptor binding (Extended Data Fig. 3). This suggests that the 
conserved mechanism of allosteric activation is primarily mediated by 
the movement of H5, thereby breaking the contacts between H5 and H1. 
As the residues that form these contacts are conserved in all 16 Ga 
proteins, the described contact re-organization is likely to be universal 
for all Ge proteins. 


The role of helix H1 in Ga activation 


In the inactive state, H1 acts as a structural ‘hub’ by linking different 
functional regions of Gx. H1 contacts the N-terminal part of H5 (trans- 
mission module), H-domain and GDP through universally conserved 
residues (Fig. 4a). In this manner, H1 links the H-domain and the GDP 
binding site with the conserved residues in the H5 transmission module, 
which in turn is physically linked to the H5 interface module that binds 
the receptor. The conserved consensus contacts between H5 and H1 
seem essential for the structural integrity of H1. Computational calcula- 
tions of the per-residue contribution to protein stability for the 79 non- 
receptor-bound structures are consistent with their role in stabilizing H1 
(Extended Data Fig. 4). Upon GPCR binding, the H5-H1 contacts are 
lost, affecting the structural integrity of H1. The contact-mediating 
positions in H1 have missing electron density in the B. AR-Gz structure* 
and hydrogen/deuterium exchange experiments have shown that this 
region is dynamic in Ga, upon receptor binding®. Upon becoming 
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Figure 4 | Helix H1 is the key SSE that contacts H5, GDP and the H-domain. 
a, Consensus SSE contacts involving H1. The line width between the nodes 
(SSE) denotes the number of consensus residue contacts. Upon receptor 
binding, H5 is displaced and crucial contacts between H1 and H5 (dark blue) 
are lost. This might explain the increased flexibility of H1 in the GPCR-bound 
state, which results in the loss of GDP contacts (green) and the H-domain 
hinge region (formed by H1, hlha and HF; light blue). b, The extent of GDP 
consensus contacts mediated by the different SSEs. See Extended Data Fig. 4. 
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flexible, the conserved consensus contacts between H1, GDP and the 
H-domain hinge region are lost; this results in the loss of a significant 
fraction of all contacts made with GDP, thereby weakening its binding 
affinity (Fig. 4b), and results in increasing the likelihood of H-domain 
opening. Since the entire sequence of H1, the contacting residues on H5, 
and the H-domain hinge positions are highly conserved across all Ga 
proteins, the mechanism of GDP release is likely to be universal for all 
Gua proteins. 


Universal mechanism of Ga activation 


While variable interface residues in H5 and elsewhere allow specific 
binding to distinct GPCRs, we find that H5 primarily harbours con- 
served positions that might allow a common mode of receptor binding 
and a conserved mechanism of allosteric activation. The contact re- 
organization between conserved residues links the disorder-to-order 
transition of H5 upon receptor binding to a change in structural stability 
of H1, ultimately leading to GDP release. More specifically, H5 is 
divided into: (1) The N-terminal transmission module, which forms a 
n-n cluster linking H5, $2, $3 and H1 via universally conserved residues 
Phe™?*, His, Phe®*° and Phe*?” in the inactive state; and (2) the 
C-terminal interface module, which undergoes a disorder-to-order 
transition in the intracellular cavity of the receptor via universally con- 
served positions. This structural transition results in a displacement of 
the H5 transmission module, thereby interrupting the n-7 cluster. The 
re-organized residues in the cluster (Phe#5* and Phe°?) contact con- 
served residues within ICL2 of the receptor (extrapolated Ballesteros— 
Weinstein numbering: 3.58 and 3.57 of BAR’), as confirmed recently 
for the CB2 receptor-Ga; complex”®. Since the conserved 1-1 cluster is 
important for the structural integrity of H1, its disruption leads to an 
increased flexibility of H1. H1 has a central role in the inactive state by 
forming contacts both to GDP (Extended Data Fig. 4) via the Walker A 
motif?”’ and to the H-domain (through a ‘cation—1 hinge’ motif; Extended 
Data Fig. 5). Thus, the increased flexibility due to the partial unfolding of 
H1 facilitates GDP release and H-domain opening. The only other con- 
served inter-domain contact is an ‘ionic latch’ between the C-terminal loop 
of helix HG of the G-domain and the hChD loop of the H-domain. This 
contact is broken upon receptor binding, which might be a result of the 
reorganization of the right Ga lobe (Extended Data Fig. 3). 

In addition to H1, residues around HG and within the s6h5 loop 
(TCAT motif) contact the GDP (Extended Data Fig. 3). The conserved 
guanine-contacting TCAT motif preserves many of its contacts within 
Ge upon receptor binding, although TCAT-to-H1 contacts are lost and 
new contacts are formed between H5, S5, S6 and the TCAT motif (that 
is, the re-organized right lobe). Likewise, residues that contact the gua- 
nosine moiety, including the interaction between the TCAT motif and 
HG, slh1 (P-loop), $5 and S6, are not extensively re-organized during 
Gz activation. Whether this arrangement poises Ga for GTP binding 
(which differs from GDP by a single phosphate and whose physiological 
concentration exceeds GDP several-fold”), and whether GTP has the 
capacity to stabilize H1 on its own and trigger Gx release from the 
receptor, remains to be addressed. An analysis of the GTP-bound Ga 
reveals that the presence of the third phosphate facilitates additional 
contacts with the switch regions (Extended Data Fig. 4c). 

Conceptually, the GPCR-bound Ga conformation can be considered 
as a metastable Gz transition state that is stable only due to interactions 
with the GPCR. Thus, the lost intra-Ga contacts between H1 and the 
transmission module of H5 are compensated by the helix extension of 
H5, the gained receptor interface contacts, and some re-organized con- 
tacts in the right lobe of Ga (H5 with S5, S6; see role of conserved 
Y320°%°? in ref. 25). In this manner, H5 and HI act as the primary 
conduits of information transfer between the receptor interaction inter- 
face (input) and the GDP binding site (output). The residues of the 
structural motifs and functional elements described here are conserved 
in all the different families of Ga proteins (Extended Data Fig. 5a). Thus, 
the mechanism described here is likely to be universal for activation of 
all cognate GPCR and Gz protein pairs. 
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Disease and engineered Ga mutations 


We analysed disease-causing mutations in the human population and 
found three key positions that are mutated (Supplementary Table 3), 
resulting in constitutive Ge activity: (1) a variant of the transmission 
module residue Phe'’?* in Ga11 causes autosomal dominant hypocal- 
caemia type 2 by becoming constitutively active”, possibly by desta- 
bilizing the contact between H5 and H1; (2) Ga; variant Ala®>3Ser, a 
position important for H1 stabilization and GDP binding, causes tes- 
totoxicosis by constitutively activating adenylate cyclase”; and (3) 
Gall variant Arg'’°Cys causes autosomal dominant hypo-parathyr- 
oidism”, possibly by affecting H-domain opening and GDP release. 
We also analysed previously performed perturbation experiments on 
different Ga types using the CGN system and can explain how these 
mutations might affect the activation mechanism (Fig. 5a and 
Supplementary Note). Furthermore, comprehensive alanine-scanning 
mutagenesis performed on Ga;,, coupled with thermostability assays 
of the mutants in the GDP-bound or nucleotide-free state coupled to 
rhodopsin”, is also consistent with the analysis performed here 
(Fig. 5b). For example, mutations in the H5 interface module mainly 
affect Ga;,-rhodopsin complex stability, whereas mutations in the H5 
transmission module primarily affect the nucleotide-bound state 
of Ga. Alanine mutations in H1 highly destabilize the GDP-bound 
state, but not the Ga;,-rhodopsin complex. In addition, mutating 
residues in the conserved m-n cluster that interact with Phe™* 
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Figure 5 | Mutational studies support the universal Ga activation 
mechanism. a, Disease and engineered mutations can be explained by the 
model of Gz activation in different Ga subfamilies. References'® ***° *>-** are 
cited in this figure. b, Comparison of the stability of Gz-GDP (AT,,; °C) and 
the GaBy-GPCR complex (change in relative complex stability; %) by Gai 
alanine mutagenesis of every position in H1 and H5. Asterisk indicates that 
mutant FA is not stable in the receptor-free state but can still form the 
complex with the receptor. 
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significantly affects Go;, stability in the GDP-bound form, but not 
in the receptor-bound form (see ref. 25). Taken together, our analysis, 
the CGN numbering system and the universal activation mechanism 
described here provides a unified framework to relate and interpret a 
number of independent experimental and disease mutations in differ- 
ent Ga subfamilies. 


Evolution of Ga activation mechanism 


To understand how allosteric regulation might have evolved in Ga, we 
compared the crystal structures of each equivalent signalling state of Ga 
and Ras. We found that H5 and H1 have significantly changed their 
functional role. In Ras, H5 has a disordered extension (hyper-variable 
region) that is post-translationally modified for membrane anchoring”’. 
In Ga, the equivalent region forms the GPCR interface module 
(the N-terminal disordered region of HN is the membrane anchor”). 
The nucleotide-binding N-terminal part of H1 is conserved in its 
sequence and its structural orientation in both Ras and Gz (Fig. 6 and 
Extended Data Fig. 6). In contrast, the central part of H1, which contacts 
the H-domain hinge, is only conserved in Gz. The C-terminal part of H1 
has a different role in Ras and Ga. While the last turn of H1 in Ras folds 
back to bind the guanine moiety via a conserved m-n stack, the equival- 
ent region in Ga forms the metastable part of H1 that remains helical in 
the GDP-bound inactive state due to contacts with the N-terminal 
transmission module of H5. These contacts are missing in Ras, as the 
C terminus of H1 and the N terminus of H5 are each three residues 
shorter, and H1 in Ras is stable without the interactions with H5. This 
means that although Ras and Gz are evolutionarily related and share the 
same architecture, minor but crucial differences in the number and 
pattern of non-covalent contacts between H5 and H1 have allowed 
the emergence of an allosteric mechanism for GDP release in Gz. 
Small extensions in H1 and H5 permit H1 to sense whether H5 is bound 
to the GPCR. The disordered C-terminal tail of H5 provides both con- 
served and variable interface residues that allow for a conserved Ga 
activation mechanism and yet permit the evolution of receptor-binding 
specificity. 


Discussion 


Our analysis suggests that GPCRs interact and activate Gx subunits 
through a highly conserved mechanism in which the interruption of 
the contacts between H1 and H5 is a key step for GDP release. In this 
sense, while H1 is the molecular switch for GDP release, H5 is the distal 
trigger that is ‘pulled’ upon receptor binding. Given that Ga proteins 
belong to the same fold, the existence of evolutionarily conserved resi- 
dues per se is not surprising. However, the observation that (1) the 
conserved residues form a network of non-covalent contacts that links 
the GPCR-binding site with the GDP-binding pocket and that (2) this 
network of contacts is consistently re-organized upon receptor binding 
suggests that this mechanism might constitute the common conserved 
set of structural changes for the allosteric release of GDP (Supple- 
mentary Note). While the conserved residue contacts are crucial for 
Gz activation, non-conserved positions can still be important for allos- 
teric activation in distinct Gx proteins’. Thus, the identified residues are 
necessary but not sufficient for G-protein activation. The variable inter- 
face residues, as well as the By subunits, could have important roles in 
receptor binding specificity for individual proteins. Thus, the conserved 
universal mechanism probably represents the ‘skeleton’ that can be 
incorporated into different contexts in different Ga proteins to maintain 
a conserved mechanism of allosteric activation and yet permit specific 
binding to the receptor. 

A comparison to small G proteins revealed how Gz evolved to bind 
GPCRs at a site that is distal to the GDP binding pocket. Emergence of 
short regions in H5 and H1 that can undergo structural transitions 
seem to have been co-opted to make a new GEF (receptor) interface 
and link it to the GDP binding site. In such a system, displacement 
of a secondary structure element (H5) upon receptor binding can 
transmit information by re-organizing key non-covalent contacts 
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Figure 6 | H5-H1 interaction permits the allosteric activation mechanism. 
a, GEF interaction surfaces (red) for Gx (3sn6) and the small G protein human 


HRas (PDB 1bkd) in the same orientation. b, H1 and H5 of the inactive 


(1got, 4q21) and GEF-bound state (3sn6, 1bkd) for Ga (blue) and HRas (grey). 
c, Consensus sequences of equivalent residues of H5 and H1 in Ga and Ras. The 


between conserved residues that connect different secondary structure 
elements. Thus, a common ancestor of the GTPase fold might have 
provided the structural framework that can be perturbed by GPCR 
binding through the interruption of H5-H1 contacts. This ‘new’ allos- 
teric site for GEFs is physically separated from the Ga effector and 
regulator binding interfaces (Fig. 6a), and could have provided the basis 
for the expansion of the GPCR family without affecting the down- 


stream signalling factors. 


Our findings suggest that in addition to evolving extensive inter- 
, another 
solution for allosteric communication is evolving short segments that 
undergo disorder-to-order transitions upon a trigger (for example, 
receptor binding). This mechanism involves the re-organization of a 
network of existing contacts to induce conformational changes that 
affect a distal site without the requirement of directly contacting resi- 
dues but linked by the same secondary structure (for example, the 
interface and transmission module of H5 make distinct sets of contacts 
but are linked through the protein backbone; like a puppet on a string). 
Since disordered and loop regions tolerate more sequence changes 
than structured regions”, an important implication is that such seg- 
ments allow for the independent evolution of regions that are import- 
ant for binding specificity but still maintain a conserved allosteric 
activation mechanism. Generalizing this principle, we suggest that 
disordered segments that can undergo structural transitions (regulated 
folding or unfolding) and thereby re-organize existing networks of 
contacts within structured regions of proteins could have an important 
role in other protein families and may be exploited in protein engin- 


faces and allosteric ‘wires’ as observed in other proteins’*** 


eering applications. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 


to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Generation of sequence and structure data sets. Identification of relevant Gu 
protein structures. All structures related to the Ga protein family (Pfam family: 
PF00503) were collected from Pfam (release 27.0) and Ensembl” using the R 
BioMart interface’. In addition, the identified 973 Ga homologue sequences (see 
below) were scanned against the entire Protein Data Bank (PDB) database using 
the BLAST algorithm“ to ensure all Ga-containing structures were identified. 91 
PDB entries (146 Ga chains) were identified, of which two were obsolete (2pz3 
retracted, 2ebc superseded by 3umr). Crystallographic data and coordinate files 
were retrieved from the RCSB PDB API (Tuesday 4 February 2014 at 16.00 PST). 
Ga structures from the parasite Entamoeba histolytica (4fid) and Arabidopsis 
thaliana (2xtz), as well as non-full-length Ga (laqg and 1lvz are solution NMR 
studies of the C-terminal helix of Ga, 3rbq contains the 11 amino acid N-terminal 
part of transducin bound to UNC119), were excluded from the analyses. Four 
structures of the last 10 C-terminal amino acid residues of Ga, bound to rho- 
dopsin (2x72, 3dqb and 3pqr) or meta-rhodopsin (4a4m) were used for the 
GPCR-Gz interface analysis. Five PDB entries had no publication associated 
and were manually traced back to their original articles: 3umr was published in 
Johnston et al.” and 4g5o, 4g5q, 4g5r, 4¢5s were discussed in a study by Jia et al.”’. 
The final set of structures in our analyses span orthologues from human, mouse, 
rat and cow and encompass twelve different Ga genes from eight different Ga 
subfamilies (GNAI1, GNAI3, GNAO, GNAS2, GNAT1, GNAQ, GNA12, 
GNA13), thereby representing all Ga families (Ga, Ga;, Gag and Gor). A full 
list of all retrieved PDBs is provided in Supplementary Table 1. 

Identification of canonical human Go. protein sequences and paralogue alignment. 
All relevant human Gz protein isoforms and variants were obtained from 
Ensembl!” using R (full list in Supplementary Table 1). The ‘canonical’ protein 
sequences for each of the 16 human Gz genes, as defined by Uniprot™, were used 
as representative sequences for each human Ga gene throughout this work. The 
sequences were aligned using MUSCLE” and were manually refined using the 
consensus secondary structure as a guide (see below). Phylogenetic relationships 
of Ga were obtained from Treefam** (family TF300673). The cladogram of the 16 
canonical human Gz protein alignment was built with the Phylogeny.fr web 
service” using the PhyML v3.0 algorithm‘ with the SH-like Approximate 
Likelihood-Ratio Test using the Jones-Taylor-Thornton substitution matrix 
and TreeDyn” for visualization. 

Orthologue alignments of one-to-one Go. orthologues of 16 human Gu genes. 
Phylogenetic relationships of Ga sequences were collected from TreeFam”, the 
Orthologous MAtrix (OMA) database*®? and EnsemblCompara GeneTrees 
(Compara)*! using R scripts. Compara had the highest fraction of complete Ga 
sequences for each human Gz gene, except for Gz, for which OMA had a better 
sequence coverage. In total, 973 genes from 66 organisms were used, of which 773 
were one-to-one orthologues. To build an accurate, low-gap alignment of such a 
number of sequences, 16 independent orthologous alignments for each human 
Ga gene were first created by aligning one-to-one orthologue groups using the 
PCMA algorithm” followed by manual refinement. Subsequently, each ortholo- 
gue alignment was cross-referenced to the CGN (see below) by referencing its 
respective human sequence to the human paralogue alignment. Conservation 
scores of each CGN position were calculated using both sequence identity and 
sequence similarity, based on the BLOSUM62 substitution matrix (Supple- 
mentary Note) using all complete sequences of the cross-referenced alignments 
(561 sequences). Sequence conservation was mapped onto PDB structures 
(Supplementary Data) and visualized by generating PDB files with B-factors 
substituted by conservation scores. 

Phylogenetic distances. The evolutionary distance of the retrieved sequences rela- 
tive to human was evaluated with TimeTree”’. Gx one-to-one orthologues extend 
back to chordate (sea squirts; Ciona savignyi and Ciona intestinalis for Gos), 
separated around 722.5 million years from Homo sapiens, and the most ancestral 
one-to-many orthologue extends back to Opisthokonta (yeast; Saccharomyces 
cerevisiae), separated by 1,215 million years from human. In this work, we only 
investigated G proteins from organisms that have a GPCR-G-protein system. 
Since plants do not encode GPCRs and the heterotrimeric G proteins are known 
to be auto-activated, we did not consider the plant G proteins in our analysis. 
Development of a common Ga numbering (CGN) system. Common Ga num- 
bering system. Comparative analyses of different protein structures and sequences 
to infer general principles of a protein family require a way of relating structural, 
genomic, or experimental data from different studies to each topologically equi- 
valent position on homologous proteins. For GPCRs, the Ballesteros—Weinstein 
numbering scheme™ enables the referencing of positions in the transmembrane 
helices of different GPCRs, not considering loop regions. We sought to develop a 
common G protein numbering (CGN) system that includes loop regions and 


describes Gz residues in three levels of detail (DSP), similar to a postal address. 
D refers to the structural domain and is optional (catalytic GTPase domain: 
G; helical domain: H); S stands for one of the 37 consensus secondary structure 
elements (including loops) of the conserved Ga topology; and P relates to 
the corresponding residue position within the consensus secondary structure 
element mapped to an alignment of all ‘canonical’ human Ga sequences 
(Extended Data Fig. 1). For a detailed guide of how to use the CGN and map 
any Gz protein, please refer to the CGN webserver (http://www.mrc-Imb.cam. 
ac.uk/CGN) and Supplementary Note. 

Mapping structures to Uniprot sequences. Since several Ga protein structures 
represent chimaeric G proteins, have peptide tags, or contain point mutations, 
each residue/position in a PDB structure was mapped to its Uniprot sequence(s) 
using the Structure Integration with Function, Taxonomy and Sequence 
(SIFTS)°° webserver followed by a manual validation for missing positions. 
This allowed assigning residue positions of each Ga structure to their equivalent 
positions in the human paralogue alignment and the orthologue alignments. 
Determination of domain D and consensus secondary structure S. Secondary 
structure assignments were made for each Gz structure using the STRIDE algo- 
rithm*’. The consensus secondary structure elements (SSEs) were determined by 
considering the most prominent secondary structure type at each topologically 
equivalent Ga position when comparing the secondary structure assignment of all 
80 Gz structures (mean and standard deviation of secondary structure type at 
each CGN position were calculated). Topologically equivalent positions had a 
high agreement in their SSE assignment and showed well-defined flanking 
regions (Supplementary Note). In addition, the assigned consensus SSEs were 
manually confirmed through a 3D-structure alignment using MUSTANG”, from 
which the domains (G-domain and H-domain) were defined. The Ga SSE 
nomenclature follows a standardized expansion of the previously defined nomen- 
clature**: uppercase letters H and S represent helices or sheets, respectively. SSEs 
of the G domain follow a numerical identifier (H1, H2, ..., H5 and S1, 82, ..., S6 
with the exception of HG and HN); SSEs of the H domain have an alphabetical 
identifier (HA, HB, ..., HF), starting from the N- to the C terminus of Ga. The 
N-terminal region that forms a membrane-anchored helix is defined as HN. 
Systematic identifiers for historical names of some loop regions (switch regions, 
P-loop, etc.) were derived by concatenating the flanking SSE names using low- 
ercase letters; for instance s6h5 refers to the loop between S6 and H5. A reference 
table including the historical loop names is provided in Extended Data Fig. 1 and 
Supplementary Table 2. 

Determination of position P. P describes the amino acid position within an SSE, as 
determined by mapping the consensus secondary structure to the human para- 
logue alignment (Extended Data Fig. 1 and Supplementary Table 2). Insertions in 
orthologues are annotated P-i, where i stands for the number of inserted residues 
after position P, for instance Arg334t4272 for the second amino acid of an 
insertion after position 27 of helix H4, found in pufferfish (Tetraodon nigrovir- 
idis) Ga, (Supplementary Note). 

Consensus non-covalent contact networks between conserved residues. Non- 
covalent residue contact networks. Non-covalent contacts between residues of a 
protein define its topology, conformation and stability. For each of the 80 Ga 
protein structures, a local version of the RINerator 0.5 package from 2014” was 
used to calculate H-bonds and van der Waals interactions between residues. 
Matrices of the all-against-all atomic distances of all residue contacts within each 
structure were computed using R and the bio3d package®. Non-canonical inter- 
action such as m-1 stacking were identified with NCI*. All other calculations, 
analysis and processing were performed using custom written scripts in R. 
Assignment of Go. structures to signalling states. Structural differences between 
Gz seem to arise from a convolution of the conformational state, binding partner, 
and Ga protein type and species (Supplementary Note). To identify the non- 
covalent contacts of a structure that are crucial for each signalling state, and 
independent of the Ga protein type and species, all Gx structures were assigned 
to one of the four different Ga signalling states depending on (1) the bound 
ligand, and (2) the interaction partner (Supplementary Table 1). The four states 
are (1) heterotrimeric GDP-bound state (inactive state), (2) nucleotide-free 
heterotrimeric receptor bound complex (GEF-bound state), (3) GITPyS and 
potentially ‘effector’-bound state (active state), and (4) RGS-bound GDP+ALF 
hydrolysis transition state (GAP-bound state). Eleven structures are in the inact- 
ive state, one full-length structure (and four structures of the C-terminal Ga 
peptide in complex with a GPCR) in the GPCR-bound state, 25 have GTPyS 
bound or/and are co-crystallized with their downstream effectors, and 40 struc- 
tures have Ga in the GTP-hydrolysis transition state with GDP and aluminium 
fluoride (ALF) bound (GDP+ ALF) and/or are co-crystalized with their RGS or 
a GTP-hydrolysis promoting peptide mimicking the RGS binding interface (for 
example, Go-Loco motif). Two structures (lcip, Isvs) had non-standard Ga 
ligands bound, and 2zjz® did not have a detailed description of its biochemical 
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relevance, and thus were not assigned to any signalling state. Eleven structures 
were identified as chimaeras and 21 included mutations (Supplementary Note). 
The publications describing the protein structure of each PDB entry were checked 
to confirm the relevance of the assigned signalling state. 

Consensus contacts between conserved residues. To compare residue contact net- 
works (RCNs) from different structures, topologically equivalent positions were 
cross-referenced with the CGN system. All RCN analyses, consensus RCN cal- 
culation, and conservation analysis were conducted using customized R scripts: 
matrices representing the absence or presence of non-covalent contacts between 
each possible pair of CGN residues in each PDB structure were computed 
(Supplementary Fig. 1). The consensus contacts of each signalling state were 
computed as the probability to find a contact in all structures of the state. 
Structure models can differ in their number of equivalent residues due to missing 
electron density, not fully fitted models, or truncations for crystallographic pur- 
poses. Thus, each consensus contact probability was normalized by the number of 
structures of the state that have the respective residue pair, in order to distinguish 
the absence of a contact from the absence of an equivalent position in a single 
PDB. To expand the structural analysis to other Gx proteins for which only 
sequence data was available, sequence conservation was mapped to each CGN 
residue (see above). 

Visualization of consensus contacts and identification of universal structural 
motifs. The consensus contacts between conserved residues in the different sig- 
nalling states were visualized to investigate the contact re-organization in detail. 
For 2D visualization, the respective consensus RCNs were exported to 
Cytoscape® using the RCytoscape interface”. For 3D visualization, R was used 
to create consensus RCNs in PyMol (The PyMOL Molecular Graphics System, 
Version 1.5.0.4 Schrédinger, LLC.) by creating pseudo PDB structure coordinate 
files that show residues as spheres from their C-alpha atoms and lines/edges 
between them using CONECT entries. Information on sequence conservation 
was mapped via the B-factor field of the pseudo PDB structures. For simplifica- 
tion, only contacts present in more than 90% of all structures with a sequence 
identity >90% were shown as ‘consensus contacts’ between conserved residues— 
this threshold was chosen based on the bimodal distribution of contact occur- 
rence (Supplementary Fig. 2). In addition, only long-range interactions (>i + 4) 
are shown for the consensus RCNs. It is important to note that these cut-offs were 
applied only for visualization purposes, while for the analysis, no cut-off was 
needed. All relevant consensus contacts were additionally visually inspected for 
each of the 80 PDB structures by creating automated PyMol sessions from R that 
superimpose all the 80 structures. To generate RCNs between SSEs, the sum of all 
contacts of the respective SSE as defined by the consensus SSE of the CGN were 
computed. Chimera® was used to manually re-evaluate atomic contacts, and 
PyMol was used to create publication-quality images. 

Interface analysis. Buried surface area and inter-Gu-GPCR residue contact net- 
works, Inter-chain RCNs between Ga and the receptor (Ga, and B2AR chains A 
and R in 3sn6, Ga, C-terminal peptide and rhodopsin from chains B and A in 
2x72, 3dqb, 3pqr, 4a4m) were calculated as described above. The buried surface 
area (BSA) was obtained from the PDBe PISA (Proteins, Interfaces, Structures 
and Assemblies)*° XML repository and normalized by the accessible surface area 
for each residue position. BSA and Gz-GPCR RCNs were mapped to the CGN 
and the Ballesteros—-Weinstein numbering, respectively. Sequence conservation 
from 561 complete Ga homologue sequences and 249 human non-olfactory class 
A GPCRs was mapped onto the interface to determine the conserved ‘hotspot’ 
residues in the interface and visualized in PyMol (Supplementary Data). The BSA 
histogram, the visualization of the residue interaction network per secondary 
structure elements, and the correlation of BSA per residue versus conservation 
were produced in R and ggplot2. 

Force-field-based energy estimations. The per-residue energy contributions to Gx 
monomer and Gaz-GPCR complex stability were calculated using FoldX 3.0, 
which uses energy terms weighted by empirical data from protein engineering 
experiments to provide a quantitative estimation of each residue’s contribution to 
protein stability and protein complex stability (http://foldx.crg.es/). For the inter- 
face analysis, the 3sn6 structure was energy minimized with the FoldX ‘repair 
pdb’ function and subsequently, the per-residue energy contributions for both the 
Gza,-B2AR complex and the monomers in isolation were calculated using the 
FoldX ‘sequence detail’, ‘analyse complex’, and ‘stability’ functions at 298K, pH 
7.0, and 0.05M ionic strength. The per-residue energy contributions to complex 
stability were calculated as the difference between the energy contributions of 
each residue in the monomer and complex (AAGinterface) and visualized with R 
(Extended Data Fig. 2a). For energy contributions of each residue within Gx 
monomers (Extended Data Fig. 4b), the average energy contribution and stand- 
ard deviation for each Gx position was computed after running the FoldX 
‘stability’ and “sequence detail’ functions at 298K, pH 7.0, and 0.05M ionic 
strength for each of the 79 non-complex structures. 
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Disorder propensity calculations for all Gu homologue sequences and structures. 
The disorder propensity of each of the 561 complete Gx homologue sequences 
was calculated with [UPred®’ (prediction-type-setting: ‘short disorder’). The 
missing structure positions were identified with bio3d package (Extended 
Data Fig. 2b). 

New and published mutational studies of different Ga classes. Identification of 
mutations, mutant structures and chimaeras. Additional literature on Ga muta- 
tions was retrieved with the text mining tool MutationMapper® and manually 
validated and filtered for correct hits/search results. Disease mutations were 
retrieved from the Database of Single Nucleotide Polymorphisms (dbSNP) and 
the Catalogue of Somatic Mutations in Cancer (COSMIC)” with biomaRt®, and 
from the Human Gene Mutation Database (HGDM)”. Mutations, chimaeras and 
peptide tags in the analysed structures were identified by comparing their Uniprot 
sequence to their PDB sequence using SIFTS” and mining PDBe annotations. All 
mutation data were mapped back to the CGN and visualized on their respective 
human Gz structure. 

Alanine scanning and stability of Gu;. The alanine scanning expression library of 
Ga; was prepared as reported before’’. The recombinant Gz, alanine mutants 
were expressed in 24 well plates, purified by standard Ni-NTA affinity chromato- 
graphy followed by buffer exchange using 96-well filter plates. The bovine rho- 
dopsin and By subunits were prepared from bovine retinas”. The melting 
temperature of each alanine mutant upon addition of GDP or GTPyS was mea- 
sured by differential scanning fluorimetry assay. The effect of each alanine Go; 
mutant on R*-G; complex formation and complex stability were measured by a 
high-throughput assay based on native gel electrophoresis. Detailed methods and 
protocols are provided in ref. 25. 

Ga versus Ras comparison. The Ras conformational cycle was featured in the 
RSCB PDB” April 2012 PDB-101 Molecule of the Month by David Goodsell 
(http://dx.doi.org/10.2210/rcsb_pdb/mom_2012_4), with high-resolution struc- 
tures showing human HRas in its active GIPyS-bound state (PDB 5p21”) and 
the GDP-bound inactive state (PDB 4q21”°). 1bdk’® was used as representative of 
the HRas GEF-bound state. These Ras representative structures were combined 
with an alignment of all human HRas paralogues identified from the OMA 
database*’. A structural alignment between the identified active and inactive 
Ras structures and the corresponding active and inactive Ga (1got and 3ums) 
was used to accurately map Ras positions to the CGN (Supplementary Data) 
despite the low sequence identity (<6%) between Ras and Ga. The number 
of atomic non-covalent contacts between helix H5 and helix H1 in the Ras and 
Ga structures was manually compared in Chimera for the structures 1got 
and 4q21. 
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G-Domain 


a Domain b —_— . 
of Startin — Endin 
ba (____s_is C= >—{_=_0 Domain GaSSE consensus human human Aermative 
Position in SSE EP EERE NAAN KARAS AGSRSRRSSTL IT LST E TREN ane anemone nmvnoramvncnaaeEtrnmemonaalE NPE Ser ee SSE reference reference “2° 
63092 |GNAS2_HUMAN NSK-TEDQRNEEKAQREANKKIEKQLQKDKQVYRATHRLLLEGAGESGKSTIVKQMRILHVNGFNGEGGEEDPQAARSNS G HN 53 1 53 
38405| GNAL_HUMAN ~GNSKTTEDQGVDEKERREANKKIEKQLQKERLAYKATHRLLLLGAGESGKSTIVKQMRILAVNGFN n . ia 3 54 56 
63096|GNAT1_HUMAN ‘SAEDKAAVERSKMIDRNLREDGEKAAREVKLLLLGAGESGKSTIVKQMKI IHEAGYS. 2 6 ~ = = = 
04899 |GNAT2_HUMAN ‘SAEDKAAAERSKMIDKNLREDGEKAAREVKLLLLGAGE SGKSTIVKQMRI IHEDGYS 2 
208754 |GNAT3_HUMAN ‘SAEDKAAVERSKMIDRNLREDGEKAAKEVKLLLLGAGESGKSTIVKQMKI IHEDGYS. a < Ei : Ss S Poop 
P11488|GNAT1_HUMAN .SAEEK-~--HSRELEKKLKEDAEKDARTVKLLLLGAGESGKSTIVKOMKI IHQDGYS: 58 G HI 12 70 81 
19087 |GNAT2_HUMAN ‘SAEDKELAKRSKELEKKLQEDADKEAKTVKLLLLGAGESGKSTIVKQMKI IHQDGYS. «2 G hha 20 82 101 uu 
AGMTZ3 | GNATS_HUMAN ‘SSESKESAKRSKELEKKLQEDAERDARTVKLLLLGAGESGKSTIVKQMKI IHKNGYS. a H HA 29 102 130 
P09471| GNAO_HUMAN ‘SAEERAALERSKAIEKNLKEDGISAAKDVKLLLLGAGESGKSTIVKQMKI IHEDGES. a H  ha-hb 9 131 139 
P19086| GNAZ_BUMAN +S SEEKEAARRSRRIDRHLRSESQRORREIKLLLLGTSNSGKSTIVKQMKI IHSGGEN a H HB 14 140 153 
P50148| GNAQ HUMAN ‘SEEAKEARRINDEIERQLRRDKRDARRELKLLLLGTGESGKSTF IKQMRI IHGSGYS. e +s 7 7 2 — 
29992 |GNA11_BUMAN ‘SDEVKESKRINAEIEKQLRRDKRDARRELKLLLLGTGESGKSTF IKQMRI IHGAGYS. on 
095837 |GNA14_HUMAN ~LSAEEKESQRISAEIERQLRRDKKDARRELKLLLLGTGESGKSTF IKQMRITHGSGYS. ” te be ae a) ae 
30679 |GNA1S_HUMAN WCLTEDEKAAARVDQEINRILLEQKKQDRGELKLLLLGPGESGKSTFIKOQMRI IHGAGYS. n H hehd 1 181 isi 
Q03113|GNA12_HUMAN 1 MSGVVRTLSRCLLPAEAGGARERRAGSGARDAEREARRRSRDIDALLARERRAVRRLVKILLLGAGESGKSTFLKOMRI IHGREFD: 86 H HD 12 182 193 
014344 |GNAI3_HUMAN 1 MADFLP--SRSVLSVCFPGCLLTSG-- -BAEQQRKSKEIDKCLSREKTY VKRLVKILLLGAGESGKSTFLKOMRI THGQDFD-— ” H —hdhe 5 194 198 
SSE_consensus HEERBEBB HB HBB BB Wa WBA HME BERBEBERLLLSSSSSSSLLLLLLBEBBEBBABERBLLLLLLLLLLLLLLLLLLLL H HE 3B 189 24 
Position in human alignment —cineon aa Ste 2Oh SERT AIS ARNR ARENAS SRDERSTY IT SSS ge eRANG RARE RE BT OES SEEBEK LTE RK EP SENT TSBSESSE OAT ATTISS H hehf Fi 212 218 
H HF 6 219 224 
ae H-Domain G hfs2 7 225 234 L2/Swi 
G s2 8 232 239 
ae { = *G (= 0 C= §C = §6-(_= §—\(-§ 6 ss 2 et 
Position inSSE_ ~amrnon nae ENE Bet SAR RANE RERER-nowown oa-ner oon wae= NOE —cinroon walEIOEe ame moneaeEl——wenon ame E Near Am oon ALENT mT OER—NOTOO G 83 8 242 249 
63092|GNaS2_HUMAN os DGEKATKVQDIKNNLKEATETIVAAMSNLVPPVELANPENQFRVDYILSVMNVPDF- ~DF PPEF YEHAKALWEDEGVRACYERSNEYOQLIDCAQYFLDKIDVIKQADYVPSDQDLLR 199 G saz 3 250 252 swil 
38405| GNAL HUMAN 72 PEEKKQKILDIRKNVKDAIVTIVSAMSTI 1PPVPLANPENQFRSDYIKSIAPITDF- ~EYSQEF FDBVKKLWDDEGVKACFERSNEYQLIDCAQYFLERIDSVSLVDYTPTDQDLLR 106 G H2 10 253 262 
63096|GNAT1_HUMAN 63 EEECKQYKAVVYSNTIQSI IAITRAMGRLK-~ IDFGDSARADDARQLF VLAGAAEE GFMTAELAGVIKRLWRDSGVQACFNRSREYQLNDSAAYYLNDLDRIAQPNYIPTQQDVLR 176 > 5 = — 
04899|GNAT2_HUMAN 63 EEECRQYRAVVYSNTIQSIMAIVKAMGNLQ-~IDFADPSRADDARQLFALSCTAEE QGVLPDDLSGVIRRLWADHGVQACFGRSREYQLNDSAAYYLNDLERIAQSDYIPTQQDVLR 177 = = 7 = — 
P08754|GNAT3_HUMAN 63 EDECKQYKVVVYSNTIQSI IAI TRAMGRLK-~ IDFGEAARADDARQLF VLAGSAEE GVMTPELAGVIKRLWRDGGVQACF SRSREYQLNDSASYYLNDLDRISQSNYIPTQQDVLR 176 
P11488|GNAT1_HUMAN 59 LEECLEF IAI LYGNTLQSILAIVRAMTTLN-~IQY¥GDSARQDDARKLMHMADTIEE. GTMPKEMSD1 1 0RLWKDSGIQACFERASEYOLNDSAGYYLSDLERLVTPGYVPTEQDVLR 172 G s4h3 6b 275 289 Swill 
P19087|GNAT2_HUMAN os PEECLEFKAI1YGNVLQSILAIIRAMTTLG-~-IDYAEPSCADDGROLNNLADS IEE: (GIMPPELVEVIRRLWKDGGVQACFERAAEY QLNDSASYYLNOLERITDPEYLPSEQDVLR 176 G H3 18 290 307 
AGMTZ3|GNAT3_HUMAN 62 EQECMEFKAVIYSNTLQSILAIVKAMTTLG-~IDYVNPRSAEDQROLYAMANTLED: GGMTPQLAEVIKRLWRDPGLOACFERASEYOLNDSAAYYLNDLDRITASGYVPNEQDVLE 176 G hss 3 308 310 
P09471| GNAO_HUMAN 6s GEDVKQYKPVVYSNTLQSLAAIVRAMDTLG--~IEYGDKERKADAKMVCDVVSRMEDT-- EPF SAELLSAMMRLWGDSGIQECFNRSREYOLNDSAKYYLDSLDRIGAADYQPTEQDILR 177 Gi $5 u 311 317 
P19086| GNAZ_HUMAN cs LEACKEYKPLIIYNAIDSLTRIIRALAALR-~ IDFHNPDRAYDAVOLFALTGPAESK-- GEITPELLGVMRRLWADPGAQACFSRSSEYHLEDNAAYYLNDLERIAAADYIPTVEDILR 177 . Ea 1 318 318 
P50148| GHAQ HUMAN c» DEDKRGFTKLVYQNIFTAMOAMIRAMDTLK-~IPYKYEHNKAHAQLVREVDVEKVSA-- ~FENP¥VDAIKSLWNDPG1QECYDRRREYOLSDSTKYYLNDLDRVADPAYLPTQODVLR. 121 & AG i =o ae 
29992|GNA11 HUMAN c» EEDKRGFTKLVYQNIFTAMOAMIRAMETLK-~ILYKYEQNKANALLIREVDVEKVTT-- ~FEHQYVSAIKTLWEDPG1QECYDRRREYQLSDSAKYYLIDVDRIATLGYLPTOODVLR 121 = ea aa = oy 
095837 |GNA14_HUMAN 75 DEDRKGFTKLVYQNIFTAMQAMIRAMDTLR-— IQYVCEQNKENAQIIREVEVDKVSH-- ~LSREQVEATKOLWQDPG1QECYDRRREYOLSDSAKYYLIDIDRIATPSFVPTOODVLR 177 
30679 |GWA15_HUMAN 72 EEERKGFRPLVYQNIF VSMRAMIEAMERLO-~ IPF SRPESKHHASLVMSQDPYKVTT- ~FEKRYAAAMOWLWRDAGLRAYYERRREF HLLDSAVYYLSHLERITEEGYVPTAQDVLR 104 iG Bs 2 oe oe 
903113|GNA12_HUMAN 27 QKALLEFRDTIFDNILKGSRVLVDARDKLG--IPWOYSENEKHGMFLMAFENKAGLP—-~~-VEPATF QLY VPALSALWRDSGIREAFSRRSEFOQLGESVKYFLDNLDRIGQLNYFPSKQDILL 203 G h4s6 20 373 392 
014344 |GNA13_HUMAN 7¢ QRAREEFRPTIYSNVIKGMRVLVDAREKLI-— IPWGDNSNOQHGDKMMSFDTRAPMAAQGMVE TRVFLOQYLPATRALWADSGIQNAYDRRREF QLGESVKYFLDNLDKLGEPDYIPSQODILL 198 G 86 5 393 397 
SSE_consensus HEBEBS BSE ABE AEB ABEBNBBHBLLLLLLLLLBuauBBEBBHRBELLLLLLLLLLLLLLLAB B86 S608 SLES A8 NAB HBNLLLLLYBBBBHBBBBELLLLLLLBEBEEE G  séhs 5 398 402 TCAT 
Postion in human siignment SESSSSS22r SPS ROE PORK NNER EN SR Ra RATER ER ETE ET eee e een as eee eee SSSR ANNE PN er aon BS Ee beans BE SBED BS RS RRA RRRRURANAARR ARRAS G HS 26 403 428 
G-Domain 
Domain 
onsse Ce >>» _0 EE |, =>, Ce Sa a E>—{_ es) 
Position in SSE ee eee eee 


63092 |GNAS2_HUMAN 200 CRVLTSGIFETKFQVDKVNFHMEDVGGORDERRKWIQCFNDVTALIFVVASSS YNMVIREDNOTNRLQEALNLEKS1WNNRWLRYTI SVILPLNXQDLLAEKVLAGKSKIEDYF PEFARYTTPEDATPEPGEDPRVTRAKYF IRDEFLRISTA- 
P38405| GNAL_HUMAN 17 CRVLTSGIFETRFQVDKVNFHMFDVGGORDERRKWIOCFNDVTAIIYVAACSS YNMVIREDNNINRLRESLDLFES1WNNRWLRT1S1ILPLNKQDMLAEKVLAGKSKIEDYFPEYANYTVPEDATPDAGEDPKVTRAKFF IRDLFLRISTA- 
63096 |GNAT1_HUMAN 177 TRVKTTGIVETHF TF KDLHFKMFDVGGORSERKKWIHCFEGVTAIIFCVALSDYDLVLAEDEEMNRMHE SMKLF DSI CNNKWF'TDTSIILFLNKKDLFEEKIX--KSPLTICYPEYAGSNITYEEAAA. --¥ IQCQFEDLNKR- 
P04899|GNAI2_HUMAN i7e TRVKTTGIVETHFTFKDLHFKMFDVGGORSERKKWIHCFEGVTAIIFCVALSAYDLVLAEDEEMNRMHE SMKLF DSICNNKWFTDTSIILFPLNKKDLFEEKIT--HSPLTICFPEYTGANKYDEAAS- ¥ IQSKFEDLNKR- 
P08754|GNAI3_HUMAN177 TRVKTTGIVETHF IF KDLYFKMEDVGGORSERKKWIHCFEGVTAIIPCVALSDYDLVLAEDEEMNRMHESMKLF DSI CNNKWF'TETS1 ILFLNKKDLFEEXIX--RSPLTICYPEYTGSNTYEEAAA. ¥ IQCQFEDLNRR- 
11488 |GNAT1_HUMAN 173 SRVKTTGIIETOQFSFKDLNERMPDVGGOQRSERKKWIHCFEGVTCIIF IAALSAYDMVLVEDDE VNRMHESLELFNS1CNHRYFATTSIVLELNKKDVFFEKIX--KAHLSICFPDYDGPNTYEDAGN- Y¥IKVQFLELNMR- 
19087 |GNAT2_HUMAN 177 SRVKTTGIIETKFSVKDLNERMFDVGGORSERKKWIHCFEGVTCIIFCAALSAYDMVLVEDDEVNRMHESLELFNSICNHKFFAATSIVLFLNKKDLFEEKIX~-KVHLSICFPEYDGNNSYDDAGN- ¥IKSQFLDLNMR- 
ABMTI3 | GNATS_HUMAN 177 SRVKTTGIIETQFSFKDLEFRMFDVGGORSERKKWIHCFEGVTCIIFCAALSAYDMVLVEDEE VNRMBESLELFNSICNHKYFSTTSIVLFLNKKDIFQEKVT--KVHLSICFPEYTGPNTFEDAGN: Y¥IKNOFLDLNLK- 
P09471| GNAO_HUMAN 172 TRVKTTGIVETHFTFKNLHFRLFDVGGORSERKKWIHCFEDVTAIIFCVALSGYDQVLHEDETTNRMHESLMLFDSICNNKFF IDTSIILFLNKKDLFGEKIX--KSPLTICFPEYTGPNTYEDAAA. ¥IQAQFESKNR-- 
P19086| GNAS_HUMAN 178 SRDMPTGIVENKF TFKELTFKMVDVGGORSERKKWIHCFEGVTAIIFCVELSGYDLKLYEDNOTSRMAESLRLFDSICNNNWF INTSLILFLNKKDLLAEKIR--RIPLTICFPEYKGOQNTYEEAAV- 


YCYPHEFTCAVDTENIRRVFNDCRDIIQRMHLROYELL 394 
Y¥CYPHETCAVDTENIRRVFNDCRDIIQRMHLKQYELL 381 
-I¥THFTCATDTKNVQFVFDAVIDVIIKNNLKDCGLF 354 
-IYTHFTCATDTKNVQFVFDAVIDVIIKNNLKDCGLF 355 
-I¥THF'TCATDTKNVQFVFDAVIDVIIKNNLKECGLY 354 
-I¥YSHMTCATDTQNVKFVFDAVIDIIIKENLKDCGLF 350 
-IYSHMTCATDTQNVKFVFDAVIDIIIKENLKDCGLF 354 
-IYSHMTCATDTQNVKFVFDAVIDIIIKENLKDCGLF 35« 
-IYCHMTCATDTNNIQVVFDAVIDIIIANNLRGCGLY 354 
-IYSHFTCATDTSNIQFVFDAVIDVIIOQNNLKYIGLC 355 


P50148| GNAQ_HUMAN 162 VRVPTTGIIEYPFDLOSVIFRMVDVGGORSERRKWIHCFENVTSIMFLVALSE YDQVLVESDNENRMEE SKALFRTI ITY PWFONSSVILFLNKKDLLEEKIN-~YSHLVDYFPE YDGPQRDAQAARE —- IIYSHFTCATDTENIRFVFAAVEDTILQLNLKEYNLV 359 
29992 |GNA11_HUMAN 162 VRVPTTGIIEYPFDLENIIFRMVDVGGQRSERRKWIHCFENVTSIMFLVALSE YDQVLVESDNENRMEE SKALFRTI ITY PWFONSSVILFLNKKDLLEDKIL--YSHLVDYF PEFDGPQRDAQAARE - IIYSHFTCATDTENIRFVFAAVKDTILOLNLKEYNLV 359 
095837|GNA14_HUMAN 178 VRVPTTGIIEYPFDLENIIFRMVDVGGORSERRKWIHCFESVTSIIFLVALSE YDQVLAECDNENRMEESKALFKTI ITYPWFLNSSVILFLNKKDLLEEKIM-~YSHLISYFPEYTGPKQDVRAARD-- VIYSHFTCATDTDNIRFVFAAVKDTILOLNLREFNLV 355 


P30679|GNA15 HUMAN 105 SRMPTTGINEYCFSVQKTNLRIVDVGGOKSERKKWIHCFENVIALIYLASLSE YDQCLEENNQENRMKESLALFGTILELPWFKSTSVILFLNKTDILEEKIP——TSHLATYFPSFQGPKQDAEAAKR— 
903113 |GNA12_HuMaN 204 ARKATKGIVEHDFVIKKIPFKMVDVGGORSOQROKWF QCFDGITSILFMVSSSEYDQVLMEDRRTNRLVESMNIFETIVNNKLFFNVSITLFLNKMDLLVEXVK--TVSIKKHF PDFRGDPERLEDVOR- YLVQCFDRK- -RRNRS KPLFHHFTTAIDTENVRFVFHAVEDTILQENLRDIMLQ 351 
914344 |GWA13_HUMAN 199 ARRPTKGIHEYDFEIKNVPFKMVDVGGORSERKRWFECF DSVTSILFLVSSSEFDOQVLMEDRLINRLTESLNIFET1VNNRVF SNVSIILFLNKTDLLEEKVQ-~IVSIKDYFLEFEGDPHCLRDVOK- FLVECFRNK- KPLYHHFTTAINTENIRLVFRDVEDTILEDNLKOQLMLQ 377 
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Extended Data Figure 1 | Human paralogue reference alignment for position in the SSE of the human reference alignment (P) are shown on top of 
common Goa numbering system. a, Reference alignment of all canonical the alignment. b, Reference table of the definitions of SSEs used in the CGN 
human Gz paralogues. The domain (D), consensus secondary structure (S)and nomenclature. 
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Extended Data Figure 2 | Energy estimation of the GPCR-Ga residue 
contributions and Ga disorder propensity. a, Energy contribution of single 
interface residues to the Ga,-B.AR complex calculated with FoldX (T = 298K, 
pH = 7.0, ionic strength = 0.05M). Conserved Gz residues (blue sequence 
logo) that were identified to form receptor—Gz inter-protein contacts with 
conserved GPCR residues (red sequence logo) are shown. The contact network 
between residues of the B2AR and Gz, is shown (red, conserved receptor 
residue; blue, conserved Gz residue; grey, variable residues; spheres represent 
Cz positions and links represent non-covalent contact). b, Consensus disorder 
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plot for all Ga proteins. The mean value of the disorder propensity of all full- 
length Ga sequences (561 sequences; homologous to all 16 human Gz proteins) 
is shown as a black line; the standard deviation at each position is shown as 
light red ribbon. The colour tone of the line indicates the number of gaps at 
an aligned position (black, no gaps). The left inset shows the disorder 
propensity of H1. The right inset highlights that H5 is highly structured in its 
N terminus, and has increased disorder propensity towards the C terminus, 
which is in agreement with the missing electron density in the 79 structures. 
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left lobe tight lobe 


Extended Data Figure 3 | Rewiring of consensus contacts between highlighted with a blue background (G-domain darker blue, H-domain 
conserved Ga residues upon receptor binding. CGN numbers and sequence _ light blue). This figure highlights the most important consensus residue 
logo for consensus contacts within Gz in the inactive state (left) and GPCR- contacts between conserved residues. Additional contacts in the right lobe are 


bound state (right) are shown. Receptor residues are shown in red; H5 residues _ discussed in the Supplementary Note and in ref. 25. For a full list of residue 
in dark blue; H1 residues in light blue; and GDP in green. The domains are contacts, please refer to Supplementary Data. 
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Extended Data Figure 4 | Details of helix H1 linking H5, GDP and the 
H-domain. a, This figure expands Fig. 4 from the main text to provide residue- 
level details of the role of helix H1. Residues forming contacts with H5 are 
shown in blue, with the H-domain in light blue and with GDP in green. Non- 
covalent consensus contacts between universally conserved residues at the SSE 
level (left) and per residue-level (centre). Lines denote non-covalent contacts 
between residues. The degree of conservation is shown as sequence logo. 
Residues are numbered according to the CGN. Helix H1 is almost 100% 
conserved across all 16 Ga types and forms three structural motifs for 
interactions with H5, the H-domain and GDP (right). b, Average per residue 
energy contribution to Ga protein stability as calculated from 79 structures 


from all four Ga subfamilies in the non-receptor-bound signalling states using 
FoldX (T = 298K, pH = 7.0, ionic strength = 0.05M). The average energy 
contribution is shown as dots, the standard deviation as bars. c, Per residue 
detail of Gz-GDP and Ga-GSP (non-hydrolysable GTP analogue) consensus 
contacts. The bar-plot shows the frequency of finding a contact mediated by 
topologically equivalent positions with GDP/GSP. Number of side-chain 

and main-chain contacts are shown as dark grey and light grey bars, 
respectively. The degree of conservation of contacting residues (calculated 
from the 561 complete Gx sequences) is represented in the right panel and 
the consensus sequence for each position is shown. 


©2015 Macmillan Publishers Limited. All rights reserved 


ANALYSIS 


H1-H5 tent ame 
a hydrophobic cluster =) 
ns TENSE 
SV, eM Phessze + 
S2-S3 FaCKRLEE S eZ < os 
BreNyel eel 
$2.6 $3.3 
« oy nee 
H18 HA2 r : Lj = 
H-domain ionic latch 
s5hg Deeg 
sShg.1 
hdhe Aspnes 
hdhe.5 
H-domain cation-mt hinge Lys" es 
H1 K ERG! Lysouts es 
elizs¥- 
H1.6 H19 htha.4 
htha HF R 
“Oe DR 
HF.4 HFS 
consensus contact 
between SSE sequence motif structural motif 
b 
Position Ga type Experiment PMIDs 
H5 Gi Changing H65 helicity induces GDP release 9452438 
Gly)-spacer or Alai4)-spacer between H5 modules 
H5 modules Gt decouples G activation from GPCR binding 11279276,12033931 
PheG-t58 Gt Gi G11 ja and Cys mutations accelerate exchange, : ; 
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Thro-49-1 Gt Ala mutation causes constitutive GDP/GTP exchange 11356823 
ValeH56 Gt Ala mutation causes constitutive GDP/GTP exchange 11356823 
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Extended Data Figure 5 | Conserved structural motifs of Ga and known 
disease and engineered mutations. a, A universally conserved cluster of 
m-n and hydrophobic interactions between S2 (Phe@**°) and $3 (Phe@“*), HI 
(Met?! and His?"""*) and H5 (Phe?) links H5 and H1 in the absence of 
the receptor. Upon receptor binding, residues within this motif (Phe@'* 
and Phe****) interact with the conserved Pro and Leu of ICL2 of the receptor 
as has been shown for Ga, (3sn6) and Go; (ref. 26). Interrupting the contacts 
between H5 and H1 seems to be the trigger for transmitting the signal of 
GPCR binding to helix H1 (which interacts with GDP and the H-domain.) The 
only conserved residue contact between the H-domain and the G-domain 
that is not in the hinge region is formed by a universally conserved salt bridge 
(H-domain ionic latch) between the very N-terminal end of HG of the 
G-domain (Lys@°"8") and the loop connecting HD and HE in the H domain 


(Asp''h¢*°). The hinge region is formed by H1, the loop between H1 and HA, 
and HF. H1 interacts via (1) a cation—n interaction mediated by a universally 
conserved residue with the loop connecting H1 and HA (Lys?™"* and 
Tyr@™!8@4) and (2) a hydrophobic interaction with HF (Lys¢"° and 
Leu''"*>). b, Disease and engineered mutations that can be explained by the 
universal Ga activation mechanism mapped on a Gz protein. Cx position 

of residues are shown as spheres; mutations at green positions cause 
spontaneous GDP release by interrupting consensus contacts between 
conserved residues, thereby ‘mimicking’ the effect of receptor binding to Ga. 
Pink positions have also been reported to cause disease by constitutively 
activating Ga. Insertion of an Ala, or Gly; after the yellow position separate the 
H5 transmission and interface module, thereby allowing GPCR binding 
without triggering GDP release. 
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Extended Data Figure 6 | Helix H5-H1 interaction in Ga provides the 
allosteric GEF activation mechanism. a, Schematic representation of 
structural motifs on H1 that are shared or unique to Ga and Ras. While the part 
of H1 with the phosphate-binding motif is conserved across both protein 
families, the C-terminal part is conserved only in Gx. H1 in Gz has three 
additional residues that allow for extensive residue contacts between H1 and 
H5. In Ras, these interactions are missing and H5 and H1 are both 3 residues 
shorter. The consensus sequence and secondary structure of equivalent 


ANALYSIS 


residues of H1 in Ga and Ras is also depicted. b, Comparison of the residue 
contact network between topologically equivalent residues in H5 and H1 in the 
corresponding inactive GDP-bound states of Ga (1got) and Ras (4q21). 

The weight of the link between SSEs denotes the number of atomic contacts. 
c, Sequence alignments of H1 and H5 of human Gz and Ras paralogues. The 
sequence alignment was obtained based on cross-referencing the alignments 
using the structures of Ga and Ras. 
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Self-renewing diploid Axin2™ cells fuel 
homeostatic renewal of the liver 


Bruce Wang'”, Ludan Zhao!, Matt Fish’, Catriona Y. Logan! & Roel Nusse! 


The source of new hepatocytes in the uninjured liver has remained an open question. By lineage tracing using the 
Wnt-responsive gene Axin2 in mice, we identify a population of proliferating and self-renewing cells adjacent to the 
central vein in the liver lobule. These pericentral cells express the early liver progenitor marker Tbx3, are diploid, 
and thereby differ from mature hepatocytes, which are mostly polyploid. The descendants of pericentral cells 
differentiate into Tbx3-negative, polyploid hepatocytes, and can replace all hepatocytes along the liver lobule during 
homeostatic renewal. Adjacent central vein endothelial cells provide Wnt signals that maintain the pericentral cells, 
thereby constituting the niche. Thus, we identify a cell population in the liver that subserves homeostatic hepatocyte 
renewal, characterize its anatomical niche, and identify molecular signals that regulate its activity. 


The cellular source of new hepatocytes in the adult liver and the 
molecular regulation of hepatocyte renewal are fundamental 
unanswered questions in liver biology. Recent studies in mice using 
genetic lineage tracing techniques have concluded that during 
homeostatic renewal, new hepatocytes arise by replication of pre- 
existing hepatocytes’. This is in line with the generally accepted view 
that in the uninjured state, hepatocyte homeostasis does not involve a 
stem cell population’. However, hepatocytes are heterogeneous, with 
striking differences in age and function across the liver lobule’. In 
addition, mature hepatocytes are generally polyploid (4N to 32N), a 
genomic state that compromises replicative capacity”, posing limita- 
tions on possible contributions of these cells to long-term liver homeo- 
stasis. It has been unknown whether a specific subpopulation of 
cells serves homeostatic renewal in the liver, as happens in many 
other tissues’'°. 

Wnt proteins are secreted short-range signals that maintain stem 
cells in many adult mammalian tissues, and are produced by the 
specialized microenvironment referred to as the stem cell niche'’. 
Wnt proteins signal primarily through the intracellular protein 
B-catenin to activate transcription. A universal transcriptional target 
of B-catenin-dependent Wnt signalling is Axin2, and its expression 
provides a reliable readout of cells responding to Wnt''’'”. Genetic 
lineage tracing of Axin2” cells has identified stem cells in several adult 
mammalian tissues'®'*. We have used this lineage tracing approach 
to identify a unique population of Wnt-responsive cells that surround 
the central vein. These diploid cells self-renew over the lifespan of the 
animal and progressively give rise to mature polyploid hepatocytes 
that can populate the entire liver lobule. We also show that these 
pericentral cells are maintained by Wnt-producing central vein 
endothelial cells that constitute the niche. 


Axin2* pericentral cells generate expanding clones 

In the adult liver, Axin2 is expressed in cells located around the central 
vein'*°, which we confirmed by in situ hybridization (Fig. 1m). In 
order to mark and follow the fates of these Wnt-responsive cells, we 
used the tamoxifen-inducible Axin2-CreERT2;Rosa26-mTmG!™ 
mouse to pulse label Axin2* cells. In these experiments, a subset of 
Axin2* cells is labelled stochastically with membrane GFP after 


tamoxifen administration. The GFP label is permanent, allowing for 
fate mapping of initially labelled cells and their descendants'®’*. A 
single low-dose of tamoxifen led to GFP labelling exclusively of peri- 
central hepatocytes (Fig. 1a). Control animals receiving corn oil did 
not show any GFP labelling (Extended Data Fig. 1). The GFP* cells 
expressed glutamine synthetase (GS), another known Wnt target 
gene’® and a marker for pericentral hepatocytes (Fig. 1b). They were 
negative for carbamoyl-phosphate synthase 1 (CPS), which marks 
midlobular and periportal hepatocytes (Fig. 1c). Over time, the popu- 
lation of labelled cells expanded as large contiguous patches spreading 
directionally from the central vein towards the portal vein (Fig. 1d, 
g, j). One year after the marking, nearly all hepatocytes in some indi- 
vidual lobules were descendants of the initially labelled Axin2* cells 
(Fig. 1)), including hepatocytes that abut the portal vein (Fig. 1j, inset). 

Pericentral cells that remained labelled throughout the course of 
the lineage trace maintained their distinct gene expression profile, 
expressing Axin2 (Extended Data Fig. 2) and GS (Fig. 1b, e, h, k) 
but not CPS (Fig. 1c, f, i, 1). Conversely, the descendants of the labelled 
cells acquired different gene expression patterns as they moved away 
from the central vein. They lost Axin2 and GS expression and gained 
CPS expression suggesting that, as they move away from the pericen- 
tral region, they no longer receive Wnt signals (see below) and sub- 
sequently differentiate. Finally, throughout the lineage traces, all 
labelled cells expressed the hepatocyte marker HNF4« (Fig. 1n), but 
not markers of other liver cell types including biliary epithelial cells 
(data not shown), indicating that Axin2* cells contribute only to the 
hepatocyte lineage. 

While Axin2* cells can generate all hepatocytes in a lobule over 
time, quantification of the labelling after one year showed that on 
average descendants of Axin2™ cells replaced 30% of the area of 
the entire liver (Fig. lo and Extended Data Fig. 3), accounting for 
approximately 40% of the hepatocytes. 


Axin2* cells self-renew 

A defining property of stem cells is the ability to self-renew. To test 
whether Axin2° cells self-renew, we labelled a maximum number of 
Axin2™ cells by administering five consecutive daily doses of tamox- 
ifen (Fig. 2a). Over time, the labelled cells expanded concentrically 
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from the central vein and, importantly, all pericentral cells remained 
labelled (Fig. 2b, c). This indicates that new pericentral cells arise 
exclusively from pre-existing labelled Axin2* cells. Thus, while 
Axin2* cells can give rise to all the hepatocytes along the lobule 
(Fig. 2c), they are not replaced by unlabelled Axin2™ cells. That is, 
pericentral Axin2* cells are a self-renewing cell population. 

To further characterize the Axin2* cell population, we used RNA- 
seq to compare the gene expression profile of FACS-isolated Axin2* 
and Axin2 hepatocytes (Extended Data Fig. 4). As expected, most of 
the differentially expressed genes between the two populations were 
known markers of liver zonation (Extended Data Table 1)!*'’. 
However, we also identified Tbx3, a transcription factor important 
in maintaining pluripotency’®, as a gene upregulated in the Axin2~ 
population. Notably, Tbx3 marks early hepatoblasts that arise around 
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Figure 1 | Axin2* pericentral cells generate 
expanding clones of hepatocytes from the central 
vein towards the portal vein over time. a, Few 
pericentral hepatocytes are labelled in Axin2- 
CreERT2;Rosa26-mTmG"™ mice following a 
single dose of tamoxifen and traced for 2 days. 
EpCam labels bile ducts. b, c, Labelled pericentral 
cells express GS (b) but not CPS (c). d, g, j, The 
120-day trace (d), 240-day trace (g) and 365-day 
trace (j) show expansion of labelled cells which can 
replace hepatocytes at the portal vein (j inset, 
arrow). e, f, h, i, k, 1, Pericentral cells maintain GS 
expression (e, h, k), while labelled progeny acquire 
CPS expression (f, i, 1). m, In situ hybridization 
for Axin2. n, All labelled cells express Hnf4a, 
including cells at the portal vein (arrow). 

0, Quantification of labelled hepatocytes over time. 
Data shows individual measurements and the 
mean. 1 = 4 animals for each time point. 

*P < 0.05, two-tailed unpaired t-tests. CV, central 
vein; PV, portal vein. Scale bars, 100 um. 
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day 10 of embryogenesis, and is required for the earliest anlage of the 
liver’. By in situ hybridization, we confirmed that Tbx3 expression is 
uniquely expressed in the single layer of pericentral cells (Fig. 2d). 
In Axin2-CreERT2 mice, one copy of the endogenous Axin2 gene 
is inactivated. Since Axin2 is a negative feedback regulator of Wnt 
signalling’, we considered the possibility that inactivation of one 
allele could confer a proliferative advantage after Wnt stimulation. 
To address this concern, we compared the DNA synthesis rate of 
Axin2* hepatocytes in wild-type versus Axin2-CreERT2*’~ mice 
by 5-ethynyl-2'-deoxyuridine (EdU) incorporation. We used GS as 
the surrogate marker for Axin2™ cells in wild-type animals because a 
suitable antibody for murine liver Axin2 staining does not exist. We 
found no difference in the DNA synthesis rate in the two strains of 
mice (Extended Data Fig. 5). We also controlled for the possibility of 
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Figure 2 | Axin2* cells self-renew. a, The majority of pericentral hepatocytes 
are labelled in Axin2-CreERT2;Rosa26-mTmG"™ mice given five doses of 
tamoxifen and traced for 7 days. b, c, The 90-day trace (b) and 365-day 


365-day trace 


trace (c) show that all pericentral cells remain labelled (see insets). Note that 
non-labelled cells do not occupy the pericentral region over time. d, In situ 
hybridization of Tbx3. CV, central vein; PV, portal vein. Scale bars, 100 pm. 
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Figure 3 | Axin2* hepatocytes proliferate faster than other hepatocytes. 

a, Quantification of EdU* cells within Axin2- and Axin2* hepatocyte 
populations. Data represent mean + s.e.m. m = 5 animals. *P < 0.05, two- 
tailed unpaired t-test. b, All pericentral hepatocytes are labelled with nuclear 
GFP in Axin2-rtTA;TetO-H2B-GFP mice given doxycycline for 7 days. c, d, 14- 


liver injury from tamoxifen administration, which could serve as a 
proliferative stimulus”’. We examined DNA synthesis by GS" peri- 
central hepatocytes in mice treated with corn oil or tamoxifen, com- 
paring wild-type and Axin2-CreERT2*’~ animals. We found no 
difference as measured by EdU incorporation (Extended Data 
Fig. 5). We conclude that neither Axin2 gene dosage nor tamoxifen 
affects proliferation of Axin2* cells in our mouse model. 


Axin2" cells proliferate faster than other hepatocytes 


The fact that Axin2* cells repopulate most of the liver lobule over 
time implies a rate of proliferation that is greater than that of Axin2 
hepatocytes. We quantified the DNA synthesis rates of the two cell 
populations as already described. Axin2-CreERT2;R26- mTmG"* 
mice were labelled with tamoxifen and then given seven daily doses 
of EdU. We found that Axin2* cells undergo DNA replication twice 
as frequently as Axin2” hepatocytes (Fig. 3a). 

To verify this result, we also examined the replicative activity of 
Axin2" cells by label dilution in which cells are tagged initially with a 
fixed amount ofa stable product, which undergoes dilution with each 
cell division. We used the Axin2-rtTA;TetO-H2B-GFP transgenic 
mouse”, which expresses a stable histone 2B-GFP fusion protein in 
Axin2™ cells when given doxycycline. Additionally, the reverse tetracy- 
cline-controlled transactivator in Axin2-rtTA mice is under the control 
of a mouse Axin2 expression cassette’, thus leaving the endogenous 
Axin2 gene locus unaffected. Once activated by doxycycline, the H2B- 
GFP protein remains stably expressed until the labelled cell undergoes 
cell division, when the H2B-GFP protein is divided between the daugh- 
ter cells, resulting in diminished GFP signal intensity”. 

Doxycycline was administered continuously for 7 days, at which 
time cells lining the central vein were labelled with nuclear GFP 
(Fig. 3b). Following a 14-day chase period after cessation of doxycy- 
cline, we observed that the number of GFP labelled cells had 
expanded, while the peak GFP signal intensity had decreased 
(Fig. 3c, f). The expansion of GFP-labelled cells around the central 
vein is concentric, consistent with the Axin2-CreERT2 lineage tracing 
results and is seen up to 28days after cessation of doxycycline 
(Fig. 3d). By 56 days after cessation of doxycycline very few GFP- 
labelled cells are seen, and no labelled cells are observed after 84 days 
(Extended Data Fig. 6). We quantified the number of GEP* cells at 
each time point and found that the cell cycling rate is approximately 
every 14 days (Fig. 3e). We further confirmed this by FACS analysis, 
which showed step-wise dilution of the peak GFP signal intensity 
every 14 days (Fig. 3f). Thus, pericentral Axin2* cells actively prolif- 
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day chase (c) and 28-day chase (d) after doxycycline. e, Quantification of GFP- 
labelled nuclei. Data shows individual measurements and the mean. n = 4 
animals per group. *P < 0.05, two-tailed unpaired t-tests. f, GFP intensity in 
day-0, day-14 and day-28 chase animals. Vertical axis shows number of events 
detected. CV, central vein; PV, portal vein. Scale bars, 100 tm. 


erate during adult homeostasis, at an estimated cell cycling rate 
of 14 days. 


Axin2* cells are mostly diploid 


Mature hepatocytes are mostly polyploid’’, which is associated with 
decreased proliferative potential and increased senescence’. Stem cells 
are typically diploid, a property that may be necessary for unlimited 
duplication*”’. We used FACS sorting to isolate Axin2* cells and 
evaluated their ploidy status with Hoechst 33342 staining (Extended 
Data Fig. 7). As expected, the majority of unsorted hepatocytes are 
polyploid (Fig. 4a, c, left bar). In contrast, the majority of Axin2* cells 
are diploid (Fig. 4b, c, middle bar). To confirm that Axin2* diploid 
cells give rise to polyploid cells, we labelled Axin2* cells with tamox- 
ifen, traced them for one year and isolated GFP” cells for ploidy 
analysis. We found that the ploidy distribution of the GFP* cells at 
the end of the trace was identical to that of unsorted hepatocytes 
(Fig. 4c, right bar), suggesting that the descendants of Axin2* cells 
mature normally into polyploid cells after leaving the pericentral zone. 


Central vein endothelium acts as a Wnt-producing niche 


The strict localization of Axin2* cells to the central vein suggested a 
local source of Wnt. We screened the normal liver for all nineteen 
mammalian Wnts by in situ hybridization’®. Two of these, Wnt2 and 
Wnt9b, were expressed exclusively in endothelial cells around 
the central vein (Fig. 5a, b), co-localizing with the endothelial cell 
marker Pecam1 (Fig. 5c, d). We also isolated liver endothelial cells 
by FACS”® and confirmed that both Wnt2 and Wnt9b are highly 
expressed in endothelial cells by quantitative reverse-transcription 
PCR (Fig. 5e and Extended Data Fig. 8). Thus endothelial cells at 
the central vein produce Wnt2 and Wnt9b as short-range signals 
for pericentral Axin2™ cells and may constitute their niche. 


Wnt signals are required for pericentral cell proliferation 


The simultaneous and overlapping expression of two Wnt family 
members suggests functional redundancy of Wnt signalling at the 
central vein. To directly test whether endothelial-cell-derived Wnt 
proteins function to maintain the precursor state of pericentral cells, 
we conditionally deleted Wntless (WIs), a Wnt-specific transporter 
molecule required for proper Wnt protein secretion”’, specifically in 
endothelial cells. We crossed a VE-cadherin-CreERT2 mouse, which 
has a tamoxifen-inducible Cre-recombinase under the control of 
an endothelial cell specific promoter, with WIs"°*4* mice. Adult VE- 
cadherin-CreERT2;W1s"°"* animals (WIs‘)?®? were given multiple 
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Figure 4 | Axin2‘ hepatocytes are mostly diploid. a, FACS plot of hepato- 
cytes stained with Hoechst 33342 and gated for diploid (2N), tetraploid (4N) 
and octaploid or greater (N+) cells. b, FACS plot of Axin2~ hepatocytes 
stained with Hoechst 33342. FSC-W, forward scatter pulse width. ¢, Ploidy 
distribution within unsorted hepatocyte population (left), Axin2* hepatocyte 
population (centre), or labelled hepatocytes after lineage tracing for one 

year (right). n = 3 animals per group. 


doses of tamoxifen to induce conditional deletion of Wls in endothelial 
cells. These mice appeared healthy without obvious systemic defects 
and their livers appeared grossly and histologically normal (Extended 
Data Fig. 9). Compared to VE-cadherin-CreERT2;WIs"’* (WIs‘") 
control animals, Axin2 expression was decreased in pericentral hepa- 
tocytes of WIs“" mice (Fig. 6a-c), consistent with loss of Wnt signalling 
around the central vein. Concurrently, there was loss of pericentral 
hepatocyte function, as shown by significantly decreased levels of GS 
expression (Fig. 6d-f). Importantly, pericentral cells, labelled by GS, 
exhibited significantly decreased proliferation rates in W1s‘‘ mice com- 
pared to WIs‘* controls (Fig. 6g). The decreased rate approached the 
proliferation rate of GS” hepatocytes, though it remained significantly 
higher. This is probably due to the mosaic nature of tamoxifen-induced 
Wls inactivation. We conclude that endothelial-cell-derived Wnt 
signals are necessary for maintaining the high proliferative state of 
pericentral cells. 
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Figure 5 | Central vein endothelial cells produce Wnt proteins and act as 
a niche for pericentral cells. a, b, In situ hybridization of Wnt2 (a) and 
Wnt9b (b) showing mRNA expression at the central vein in endothelial cells 
(arrows, inset). ¢, d, Co-in situ hybridization of Pecam1 (red) and Wnt2 (green) 
(c) and Pecam1 and Wnt9b (d) showing co-expression in endothelial cells 
lining the central vein (arrows). e, Quantitative RT-PCR of Axin2, Wnt1, Wnt2 
and Wnt9b of FACS-isolated liver cells. EC, endothelial cells. Data represent 
mean + s.e.m; n = 5 animals. Scale bars, 100 tum (a, b), 20 um (c, d). 


Discussion 

In this paper we present a new view of hepatocyte homeostasis in the 
uninjured liver (Fig. 6h). We have identified a Wnt-responsive cell 
population that resides within a confined niche around the central 
vein. These cells self-renew and contribute to hepatocyte maintenance 
by differentiating into and replacing other hepatocytes along the hep- 
atic lobule in the normal liver. The existence of this pericentral cell 
population suggests that the fundamental mechanisms regulating 
liver renewal are similar to other organs in which homeostatic renewal 
involves small populations of stem cells that maintain the tissue. In the 
liver however, our model is novel because it was previously thought 
that all hepatocytes are equivalent in their renewal potential. In con- 
trast, we show that hepatocytes are made up of more than one cell type 
and are not equivalent in replicative ability during homeostasis. Given 
the properties of the cell population under study, we postulate that the 
Wnt-responsive pericentral cells are hepatocyte stem cells. 

Several features make pericentral cells unique compared to other 
hepatocytes. Although pericentral cells express markers common to 
other hepatocytes, they also specifically express Axin2, Tbx3 and GS 
while lacking CPS. Pericentral cells proliferate at a higher rate com- 
pared to other hepatocytes, an observation that is consistent with ref. 
30. Furthermore, pericentral cells possess a diploid genome, in con- 
trast to most other hepatocytes, which are polyploid. Finally, and most 
importantly, while pericentral Axin2™ cells can differentiate into all 
hepatocytes along the lobule, including those that line the portal vein, 
Axin2 hepatocytes do not replace pericentral cells during homeosta- 
sis. As pericentral cells can self-renew over the long term and differ- 
entiate into other hepatocytes, we suggest that they fit the functional 
definition of a stem cell. 
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Figure 6 | Central-vein-derived Wnt proteins are required for pericentral 
cell proliferation. a, b, Axin2 in situ hybridization in Wis’* (a), and WIs“ 
(b) mice. c, Quantification of Axin2 in situ signal. Data represent 

mean ~ s.e.m.; n = 5 animals per group. *P < 0.05, two-tailed unpaired t-test. 
d, e, GS in situ hybridization in Wis’* (d) and Ws“ (e) mice. f, Quantification 
of GS in situ signal. Data represent mean + s.e.m.; n = 5 animals per group. 


The diploid nature of pericentral cells is important and surprising, 
although nuclear size measurements in rat livers have suggested the 
presence of smaller nuclei near the central vein*’. This sheds light on 
a long-standing question in liver biology. Mature polyploid hepato- 
cytes display chromosomal abnormalities**” and display impaired 
replication”®. By maintaining a diploid genome, the pericentral cells 
would, like stem cells”*, retain unlimited replicative potential. It is 
interesting to note that during the cell cycle, levels of Wnt signalling 
peak at the G2/M phase*’. If Wnt proteins regulate expression of 
mitotic control genes such as the phosphatase Cdc25**, they could 
direct cells to mitosis and continued diploidy rather than to non- 
mitotic DNA replication and polyploidy. 

A defining feature of pericentral cells is their localization to a Wnt- 
rich anatomical niche. While Wnt-regulated genes such as B-catenin 
and Apc are known to function in liver development*” and zonation", 
the types and sources of Wnt have not been identified. We found that 
Wnt9b is specifically expressed in endothelial cells at the central vein, 
adjacent to the pericentral cells, while Wnt2 is expressed in both the 
sinusoidal and central vein endothelial cells. Notably, Wnt2 produced 
by sinusoidal endothelial cells is known to be important for hepato- 
cyte regeneration after injury”®. Similarly, in other stem cell niches, 
lipid-modified Wnt signals act as short-range cues, maintaining stem 
cells in the immediate vicinity of the niche but not outside!’. 

It has been suggested that there may be a periportal source of new 
hepatocytes under normal conditions”®. Our lineage tracing studies 
do not exclude the possibility that other sources of hepatocytes exist 
during homeostasis since after one year the descendants of pericentral 
cells replace on average only 40% of hepatocytes within the liver. 
However, a portal-based population would be regulated differently, 
since we find no expression of Wnt9b by the portal vein endothelium. 

Liver is known to regenerate efficiently after injuries such as partial 
hepatectomy or chemical insult. It has been reported that during 
regeneration after chemical damage, a Wnt-responsive population 
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*P <0.05, two-tailed unpaired t-test. g, Quantification of EdU~ cells within 
GS” or GS* hepatocyte populations in Wls"’* and Wls" animals. Data shows 
individual measurements and the mean. n = 4 animals per group. *P < 0.05, 
two-tailed unpaired t-test. h, Schematic of hepatocyte homeostatic renewal by 
pericentral cells. CV, central vein; PV, portal vein. Scale bars, 100 um. 


of cells near the portal vein can be labelled by the Lgr5 receptor gene’. 
These cells, unlike pericentral Axin2™ cells, do not express hepatocyte 
genes, but subsequently differentiate into bile duct epithelial cells and 
hepatocytes and thus could be similar to injury-induced oval cells’. 
Clearly, Lgr5* /oval cells are distinct from the cells we identify here, as 
pericentral cells maintain hepatocyte homeostasis in the uninjured 
liver while Lgr5*/oval cells have only been reported after injury. 

The stem cell marker Tbx3 is expressed widely in early liver hepa- 
toblasts and is important for hepatoblast proliferation and initiation of 
hepatocyte differentiation'”*’. Our findings that pericentral cells also 
express Tbx3 leads to the intriguing hypothesis that pericentral cells 
may represent the persistence of an embryonic hepatocyte progenitor 
population into a self-renewing cell population in the mature liver. 

It is noteworthy that liver cancer is often characterized by loss-of- 
function mutations in negative regulators of the Wnt pathway, 
including Axin and APC**. In a mouse model of liver cancer caused 
by Met overexpression, liver tumours were found to arise exclusively 
from cells located at the central vein®”*°, suggesting that pericentral 
Axin2* cells, normally controlled by a paracrine Wnt signal, are 
precursors to liver cancer. This would explain why liver tumours 
contain mostly diploid cells*’, an observation that was earlier ratio- 
nalized by polyploid hepatocytes becoming diploid after oncogenic 
transformation. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Animals. B6 and FVB mice (Charles River Laboratories) were used for wild-type 
analysis. Axin2-CreERT2 mice’* and VE-cadherin-CreERT2 mice” have been prev- 
iously described. Rosa26-mTmG"™ (Gt(ROSA)26Sor!?™4( ACT tA Tomato EGFP) Luo pyy42, 
WIs"® (1298- Wis”! 4"/J)?, Axin2-rt TA (B6.Cg-Te(Axin2-rtT A2S*M2)7Cos/J)"", 
and TetO-H2B-GFP (Tg(tetO-HIST1H2BJ/GFP)47Efu/J)~* mice were obtained 
from The Jackson Laboratory. All alleles were heterozygous, except where stated. 

For lineage tracing studies, Axin2-CreERT2;Rosa26-mTmG" mice 8-12 
weeks of age received intraperitoneal injections of tamoxifen (Sigma, 4mg per 
25 g mouse weight) dissolved in 10% ethanol/corn oil (Sigma) either once or on 
five consecutive days. Representative figures of Axin2-CreERT2;Rosa26-mTmG"™ 
lineage tracing (Figs la-l and 2a-c) are from n=5 animals per time point. 
Quantification of the area of labelled hepatocytes (Fig. 1o) was performed on 
Axin2-CreERT2/Rosa26-mTmG™ mice given five daily doses of tamoxifen and 
lineage traced. 

For label dilution studies Axin2-rtTA;TetO-H2B-GFP mice 8-12 weeks old 
received doxycycline hyclate (Sigma, 1 mgml7') in drinking water for 7 days, 
then chased for 0, 14, 28, 56 or 84 days. Representative figures of Axin2rtTA; 
TetO-H2B-GFP label dilution (Fig. 3b-d) are from n = 4 animals per time point. 

For the endothelial cell conditional knockout of Wntless, VE-cadherin- 
CreERT2;WI1s"™ mice (WIs“‘ and WIs“*) aged 8-10 weeks received intraperito- 
neal injections of tamoxifen on five consecutive days and sacrificed 7 days after 
the last dose of tamoxifen. For proliferation studies (Fig. 6g), mice were given 
seven consecutive daily doses of EdU after the last dose of tamoxifen and sacri- 
ficed 2 h after the last EdU dose. 

All animal experiments and methods were approved by the Institutional 
Animal Care and Use Committee at Stanford University. Mice used in this study 
were age- and gender-matched littermates including both sexes. All mice were 
housed in the animal facility of Stanford University on a 12-h light/dark cycle 
with ad libitum access to water and normal chow except when otherwise indi- 
cated. The animal experiments were not randomized. The investigators were not 
blinded to allocation during experiments and outcome assessment. 

Statistics. The statistical analysis used to measure significance is the two-tailed 
unpaired Student's t-test. The G* Power calculator (G*Power 3.1.9.2) was used for 
sample size calculations. For an « probability of 0.05, a B probability of 0.8, 
expected observed differences between the two comparison groups of 50% and 
assuming 15% standard error of the mean within each group, the sample size 
required to detect a statistically significant difference is four animals per group. 
Liver histology and immunofluorescence. Mouse livers were fixed in 4% PFA 
overnight at 4°C, cryoprotected in 30% sucrose for 24h at 4°C then embedded 
in OCT and snap frozen. All immunofluorescence staining was performed in the 
dark. Cryosections (10 1m) were incubated in blocking buffer (5% normal don- 
key serum, 0.5% Triton-X in PBS) at room temperature and stained with primary 
and secondary antibodies, then mounted in Prolong Gold with DAPI mounting 
medium (Life Technologies). The following primary antibodies were used: GFP 
(chicken, 1:500, Abcam ab13970), GS (mouse, 1:500, Millipore MAB302), CPS 
(rabbit, 1:100, gift from W. Lamers), EpCAM (rabbit, 1:100, Developmental 
Studies Hybridoma Bank, clone g8.8), HNF40 (rabbit, 1:100, Santa Cruz 
Biotechnology sc-8987). 

Hepatocyte proliferation assay. Hepatocyte proliferation in vivo was measured 
by 5-ethynyl-2'-deoxyuridine (EdU) uptake. In brief, mice received a dose of 
intraperitoneal EdU (Life Technologies, 50 mg per kg mouse weight) daily for 
7 days and harvested half a day after the final EdU dose. For EdU detection, 
cryosections were first stained with the appropriated primary and secondary 
antibodies, then incubated with the reagents in the Click-iT EdU Alexa Fluor 
555 Imaging Kit (Life Technologies) prepared according to the manufacturer’s 
instructions and mounted in Prolong Gold with DAPI mounting medium. 
Liver cell isolation and flow cytometry. Hepatocytes and liver endothelial cells 
were isolated from mice by a two-step collagenase perfusion technique with 
modifications. In brief, after the inferior vena cava was cannulated and portal 
vein was cut, the liver was perfused at 10 ml min ! through the inferior vena cava 
with Liver Perfusion Medium (Invitrogen) at 37 °C for 10 min, followed by per- 
fusion with collagenase type IV (Wellington) for an additional 10 min. The liver 
was dissociated and passed through a 70 um filter. Hepatocytes were separated 
from non-parenchymal cells (NPCs) by low-speed centrifugation (30g for 
3 min X 3), and further purified by Percoll gradient centrifugation as previously 
described**. NPCs were pelleted from supernatant by centrifugation (300g for 
5 min X 3) and then stained with cell surface markers for endothelial cell isolation 
and flow cytometric analysis as previously described”®. 

Cells were analysed on FACS ARIA II (BD). Data were processed with 
FACSDiva 8.0 software (BD) and FlowJo v10 (FlowJo). Doublets were excluded 
by FSC-W X FSC-H and SSC-W X SSC-H analysis. Single-stained channels were 
used for compensation and fluorophore minus one controls were used for gating. 


The following antibodies were used: CD31-PE (eBioscience 12-0311-81), CD34- 
FITC (eBioscience 11-0341-85), rat VEGFR3 (gift from B.-S. Ding) with anti-rat 
Alexa Fluor 647 secondary (Jackson Immuno 712-605-153). Sinusoidal endothelial 
cells were identified as VEGFR3*CD31*CD34°~ cells. Central vein endothelial 
cells were identified as VEGFR3- CD31*CD34* cells. 

Hepatocyte ploidy measurement. Hepatocyte ploidy staining was performed as 
previously described**. Wild-type FVB mice were used for baseline hepatocyte 
ploidy measurements. For ploidy measurement of Axin2* hepatocytes, Axin2- 
CreERT2;Rosa26-mTmG™™ mice were given five daily doses of tamoxifen (4 mg 
per 25 g body weight) and cells were isolated 2 days or 1 year after the last dose 
of tamoxifen and stained with Hoechst 33342 (Invitrogen). Cells were analysed 
on FACS ARIA II (BD). Data were processed with FACSDiva 8.0 software 
(BD) and FlowJo v10. FACS plots (Fig. 4a, b) are representative ploidy plots 
from n=5 wild-type animals and n=3 Axin2-CreERT2:Rosa26-mTmG"* 
mice, respectively. 

Real-time RT-PCR analysis. Liver endothelial cells were FACS sorted as 
described above. Cells were homogenized using QIAshredder (Qiagen) and total 
RNA was purified using an RNeasy mini kit (Qiagen) according to the manu- 
facturer’s instructions. The total RNA was reverse transcribed using random 
primers (High Capacity cDNA Reverse Transcription kit, Life Technologies). 
Gene expression was then assayed by real-time PCR using TaqMan Gene 
Expression Assays (Applied Biosystems) on an ABI 7900HT real-time PCR sys- 
tem. The following TaqMan probes were used: Axin2 (Mm00443610), Wntl 
(Mm01300555), Wnt2 (Mm00470018), Wnt9b (Mm00457102). 

RNAscope in situ hybridization. Paraffin-embedded liver sections (5 um) were 
processed for RNA in situ detection using the RNAscope 2-plex Detection Kit 
(Chromogenic) according to the manufacturer’s instructions (Advanced Cell 
Diagnostics)**. RNAscope probes used were: Axin2 (NM 015732, region 330- 
1287), GS (NM 008131, region 103-973), Wnt2 (NM 023653, region 857-2086), 
Wnt9b (NM 011719, region 727-1616), PECAM1 (NM 001032378, region 915- 
1827), Tbx3 (NM 198052). 

Representative figures of in situ hybridization of Tbx3 (Fig. 2d), Wnt2, Wnt9b 
and Pecam1 (Fig. 5a—d) are from n = 5 wild-type B6 mice aged 8 weeks. 

Representative figures of in situ hybridization from VE-cadherin-CreERT2; 
WIs" studies (Fig. 6a, b, d, e) are from n = 5 mice from each group. 
Microscope image acquisition and quantification. All sections were imaged 
using the Axioplan 2 microscope, the AxioCam MRm (fluorescence) and 
MRCS (bright field) cameras and using Axiovision AC software (Release 4.8, 
Carl Zeiss). Image acquisitions were done at room temperature using X10 NA 
0.3, X20 NA 0.5, and X40 NA 0.75 EC Plan-Neofluar objectives. Co-localization 
images were obtained using confocal microscopy using a Leica SP5 confocal 
detection system fitted on a Leica DMI6000 inverted microscope equipped with 
a X20 NA 0.75 HC PL apochromatic glycerol-immersion objective, and a X40 
NA 1.3 HCX PL apochromatic oil-immersion objective (Leica) and using Leica 
LAS AF system software. For fluorescence area quantification, tiled images of 
entire liver sections were acquired using a Zeiss Cell Observer Spinning Disc 
confocal system on an Axio Observer.Z1 inverted microscope with Zen 2012 
software (blue edition). Image acquisitions were done at room temperature using 
a X20 NA 0.5 EC Plan-Neofluar objective. 

Tiled images were stitched together in Zen 2012 (blue edition) and quantified 

using ImageJ. For some images, contrast, colour and dynamic range were globally 
adjusted in Adobe Photoshop (Adobe Systems). Nuclei for EdU-labelled and 
total cell counts were quantified using ImageJ. Thresholding and watershed trans- 
forms were used. Pericentral hepatocytes were identified with either GFP (in 
Axin2-CreERT2;Rosa26-mTmG"™ mice) or GS (in wild-type or VE-cadherin- 
CreERT2;WIs"™ mice). GFP* and GS* cells were quantified manually. Axin2 
and GS in situ images were quantified with RNAscope SpotStudio software (ver- 
sion 1.0, Advanced Cell Diagnostics)“*. 
RNA-seq analysis. Using hepatocytes isolated from Axin2-rtTA;TetO-H2B-GFP 
animals, GFP* and GEP’ cells from three animals were sorted using FACS. Cells 
were lysed in TRIzol (Life Technologies) and treated with chloroform. The aque- 
ous layer was precipitated with ethanol and RNA was isolated with QIAGEN 
RNAeasy Mini kit following the manufacturer’s instructions. cDNA barcoded 
libraries were made using the TruSeq Stranded mRNA Sample Prep Kit 
(Illumina) following the manufacturer’s instructions. Samples were sequenced 
on an Illumina HiSeq 2000 instrument (three samples per lane, 100-bp paired- 
end reads) to yield>50 million reads per sample. 

Processing and analysis of FASTQ files were performed using Galaxy*®. A 
custom Galaxy instance (UC Davis Bioinformatics Core) was run on Amazon 
AWS. Removal of adaptor contamination and quality trimming were performed 
using Scythe v1.21 (https://github.com/ucdavis-bioinformatics/scythe) and Sickle 
v1.21 (https://github.com/ucdavis-bioinformatics/sickle). TopHat v2.0.11°° was 
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used to align reads to the mouse mm10 assembly, and Cuffdiff v.2.2.2°” was used for 
differential gene expression analysis. 

The data discussed in this publication have been deposited in NCBI’s Gene 
Expression Omnibus“ and are accessible through GEO Series accession number 
GSE68806. 
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Corn oil x 1, 2 day trace 


CV 


Corn oil x 1, 365 day trace Reams Corn oil x 5, 365 day trace 


Extended Data Figure 1 | Leakiness in Axin2-CreERT2;Rosa26-mTmG"™ ___ seen after five consecutive daily doses of corn oil and traced for 7 days (b) or 
mice is not observed in animals injected with corn oil. a,c, No GFP labelling 365 days (d). All animals were 8-week-old Axin2-CreERT2;Rosa26-mTmG"™ 
is seen in Axin2-CreERT2;Rosa26-mTmG"™ mice after a single dose of mice. Images are representative images from n = 5 mice per condition and 
corn oil and traced for 2 days (a) or 365 days (c). b, d, No GFP labelling is time point. CV, central vein; PV, portal vein. Scale bars, 100 lum. 
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Single tamoxifen injection (4mg/25gm x1) 


120 day trace 240 day trace 365 day trace 


, Py, 


Extended Data Figure 2 | Axin2 expression remains restricted to pericentral _ Representative in situ images are from n = 5 animals per time point. CV, 
cells. a-—c, In situ hybridization for Axin2 in 120-day trace (a), 240-day trace _ central vein; PV, portal vein. Scale bars, 100 um. 
(b) and 365-day trace (c) Axin2-CreERT: '2;Rosa26-mTmG"™ mice. 
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Extended Data Figure 3 | Descendants of Axin2* cells replaced 30% of the area of the liver. Tiled image of entire liver section of a 365-day trace 
Axin2-CreERT2;Rosa26-mTmG"™ mice. Image is representative of n = 5 animals at this time point. Scale bar, 2,500 um. 
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Extended Data Figure 4 | FACS sorting gates for GFP* cells in Axin2- 
rtTA;TetO-H2B-GFP mice. Eight-week-old Axin2-rtTA;TetO-H2B-GFP 
mice were labelled with doxycycline for 7 days and chased for various lengths of 
time. Hepatocytes were enzymatically dispersed and sorted by FACS. 

a-c, Successive gating shows sequential selection of all hepatocytes (a), single 
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cells by forward scatter (b), and side scatter (c). d, Dead cells were excluded 
by propidium iodide labelling. e, GFP-positive cells were gated and either 
sorted for RNA-seq analysis or further graphed as histograms for GFP intensity 
analysis (see Fig. 3g). 
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Extended Data Figure 5 | Axin2 gene dosage and tamoxifen have no effect _ staining. All other hepatocytes were identified by Hnf4a*/GS~ antibody 
on pericentral hepatocyte proliferation rate. Wild-type and Axin2CreERT2*’~ _ staining. The EdU-positive rates within the two hepatocyte populations as a 


mice were given EdU daily for 7 days. A subset of wild-type and percentage of total HNF4a" cells were essentially the same regardless of 
Axin2CreERT2*’~ mice was given 4 mg of tamoxifen per 25 g body weight Axin2 gene dosage or tamoxifen administration. n = 5 animals per group. 
daily for 5 days. Pericentral hepatocytes were identified by Hnf4a*/GS* Data represent mean + s.e.m. *P > 0.05. 
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Extended Data Figure 6 | Axin2* hepatocytes proliferate rapidly. Axin2- central vein. b, After 84 days, no GFP* cells are seen. Images are representative 
rtTA;TetO-H2B-GFP mice were given doxycycline for 7 days. a, 56 days of n = 4 animals per time point. 
after cessation of doxycycline, very few GFP” cells are seen around the 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


250K - 
FSC singlets Cc SSC singlets 
97.6 99.5 
200K 7 
50K 4 
&) ] 
~” 4 
fee 
a 100K 7 
] ] 
50K 4 50K 7 
: 01 ] 
as een [ed ss es i Ul eee epee 
0 0 50K 0 50K 


50K 100K 150K 200K 250K 
FSC-A 


PI negative 250K 
71.4 e 
200K 
sok 
12) 
7) 


96.0 


yr 


4 


: 10 


10 
PE-A 


10° 


10 


Extended Data Figure 7 | FACS sorting gates for GFP™ cells in Axin2- 
CreERT2;Rosa26-mTmG™* mice for ploidy analysis. Eight-week-old 
Axin2-CreERT2;Rosa26-mTmG"™ mice were labelled with five daily doses 
of tamoxifen and traced for 7 days. Hepatocytes were enzymatically dispersed 
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and sorted by FACS. a-c, Successive gating show sequential selection of all 
hepatocytes(a), single cells by forward scatter(b), and side scatter (c). d, Dead 
cells were excluded by propidium iodide labelling. e, GFP-positive cells were 
gated and graphed as histograms for Hoechst staining (see Fig. 4). 
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Extended Data Figure 8 | FACS sorting gates for endothelial cells. Eight- 
week-old wild-type C57B6 mice were used for endothelial cell isolation. Livers 
were enzymatically digested, hepatocytes were removed by centrifugation 
and nonparenchymal cells were antibody stained and sorted by FACS. 

a-c, Successive gating showed sequential selection of non-parenchymal cells by 


size (a), single cells by forward scatter (b), and side scatter (c). d, Dead cells were 
excluded by DAPI labelling. e, endothelial cells were identified by CD31- 
phycoerythrin-positive staining. f, Sinusoidal endothelial cells (SEC) were 
identified as CD34-FITC* VEGFR3-APC* while central vein endothelial cells 
(CEC) were identified as CD34-FITC* VEGFR3-APC_. 
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Extended Data Figure 9 | Histology of VE-cadherin-CreERT2;W1s7°!"™ 
animal versus control. a, Control (VE-cadherin-CreERT2;WIs"°*’*) animals 
given five daily doses of tamoxifen and traced for 7 days after the last tamoxifen 
dose. Haematoxylin and eosin staining of the liver shows normal histology. 


b, Wls-knockout animals (VE-cadherin-CreERT2;WIs"™") also showed 
normal liver histology. Images are representative images from n = 5 animals 
per group. Insets show central veins. Scale bars, 100 Lm. 
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Extended Data Table 1 | Partial list of differentially expressed genes in Axin2* vs Axin2~ hepatocytes by RNA-seq analysis 


Glutamine synthetase (Glul) -2.58912 
Leukocyte cell-derived chemotaxin 2 (Lect2) -1.85582 
Axin2 -1.96625 
B glycoprotein (Rhbg) -3.1865 
Cytochrome P450 1a2 (Cyp1a2) -2.02709 
T-box transcription factor 3 (Tbx3) -1.26286 


0.00961319| 
0.00961319| 
0.00961319| 
0.00961319| 
0.00961319| 

0.0380302| 


Asparaginase (Aspg) 1.25545 
Glutaminase 2 (Gls2) 1.53395 


0.0441798 
0.0441798 


Cells were isolated from Axin2-rtTA;TetO-H2B-GFP mice after labelling with doxycycline. Genes preferentially expressed in Axin2* hepatocytes are indicated by negative logs fold changes and genes preferentially 
n Axi 


expressed in Axin2~ hepatocytes are indicated by positive logs fold changes. q value (false-discovery-rate-adjusted P value) <0.05 used to determine significance. 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


doi:10.1038/nature14685 


Structure of the eukaryotic MCM 


complex at 3.8A 


Ningning Li’*, Yuanliang Zhai***, Yixiao Zhang', Wanqiu Li', Maojun Yang’, Jianlin Lei’, Bik-Kwoon Tye”* & Ning Gao! 


DNA replication in eukaryotes is strictly regulated by several mechanisms. A central step in this replication is the 
assembly of the heterohexameric minichromosome maintenance (MCM2-7) helicase complex at replication origins 
during Gl phase as an inactive double hexamer. Here, using cryo-electron microscopy, we report a near-atomic 
structure of the MCM2-7 double hexamer purified from yeast G1 chromatin. Our structure shows that two single 
hexamers, arranged in a tilted and twisted fashion through interdigitated amino-terminal domain interactions, form 
a kinked central channel. Four constricted rings consisting of conserved interior B-hairpins from the two single 
hexamers create a narrow passageway that tightly fits duplex DNA. This narrow passageway, reinforced by the offset 
of the two single hexamers at the double hexamer interface, is flanked by two pairs of gate-forming subunits, MCM2 and 
MCMS. These unusual features of the twisted and tilted single hexamers suggest a concerted mechanism for the melting 
of origin DNA that requires structural deformation of the intervening DNA. 


For DNA to be replicated, two strands of the duplex DNA must be 
separated so that each can serve as a template for the synthesis of daugh- 
ter strands. In both prokaryotes and eukaryotes, DNA unwinding is 
carried out by specialized helicases that encircle and translocate along 
one of the DNA strands!. However, the mechanisms for the initial melt- 
ing or unwinding of origin DNA are markedly different’. In bacteria, the 
origin recognition protein DnaA initiates origin melting and then 
recruits hexameric helicase DnaB* to unwind DNA by translocation 
on the lagging strand in the 5'-3’ direction’. By contrast, in eukaryotes, 
the origin recognition complex first binds replication origins without 
effecting initial origin melting, and then loads two MCM2-7 (ref. 4) 
single hexamers with the help of the DNA replication proteins Cdc6 
and Cdt1 onto double-stranded origin DNA to form a double hexamer*®. 
This inactive assembly of proteins is known as the pre-replicative com- 
plex. Subsequent activation of MCM2-7 complex takes place in the S 
phase, and requires several factors and cell-cycle-specific kinases’, 
resulting in the formation of an active replicative helicase, the Cdc45- 
MCM2-7-GINS (CMG) complex’*. CMG translocates along the leading 
strand in a 3’-5’ direction to unwind duplex DNA (steric exclusion 
model)’*"”. However, how origin DNA is melted before active replica- 
tion elongation is unknown. This process probably requires the reconfi- 
guration of MCM2-7 helicase, a complex molecular motor that has 
defied high-resolution structural analysis for decades. At present, much 
of the mechanistic insights came from low-resolution structures of the 
MCM2-7 complex in functional forms from different species”’®'*’, as 
well as crystal structures of simpler archaeal versions, in non-functional 
oligomers”, truncations” or a chimaeric hexamer”. 

In this study, we purified the endogenous MCM2-7 double hex- 
amer from G1 chromatin of budding yeast (Extended Data Fig. la-c), 
and determined its cryo-EM structure at an overall resolution of 3.8 A 
(gold-standard Fourier shell correlation 0.143 criteria) (Extended 
Data Fig. 1k). Except for peripheral regions, the core of the map is 
better than 3.5A (Extended Data Fig. 1j), which enabled atomic 
model building for ~80% of sequences of this 1.2-megadalton 
(MDa) complex. Our structure reveals rich details for the organiza- 


tion of this large complex, and informs many functional aspects of this 
replicative helicase, particularly in the initial origin melting. 


Overall structure and domain organization 


A first glimpse of the structure is the tilted arrangement of two single 
hexamers, with a 14° wedge in between (Fig. la—c), a feature already 
noticed from low-resolution data of the MCM2-7 double hexamer*"® 
and the SV40 large tumour antigen*'. The two single hexamers also 
have a twisted arrangement (Fig. la, side panel), resulting in the 
misalignment of two hexamer axes. The quality of the density map 
allowed an independent assignment of six subunits, being 2-6-4-7-3-5 
(viewed from the carboxy-terminal domain (CTD) ring) (Fig. 1d), 
consistent with the well-established model’’****, Notably, when 
viewed from the single hexamer axis, the gravity centres of three major 
structural components—NTD-A (A subdomain of N-terminal 
domain (NTD)), oligonucleotide/oligosaccharide-binding (OB)-fold 
(C subdomain of NTD), and CTD—fall onto three eccentric circles 
(Fig. 1d). While the circles of NTD-As and OBs are nearly concentric, 
the CTD circle exhibits apparent rotational and translational offsets, 
indicating a relative shift and twist between the NTD and CTD rings 
within the single hexamer. Also, the NITD-As and OBs for each 
subunit are nearly vertically arranged (as indicated by their centres 
falling along on the same radial lines), with slight rotations in oppos- 
ite directions for MCM4 and MCM5. Notably, the CTDs of all six 
subunits have left-handed twists to varying extents (Fig. 1d, f) with 
respect to their OBs and NTDs. Furthermore, the distances between 
neighbouring CTDs are different (Fig. le), showing a 4-A difference 
between tightly (4:7, 7:3 and 5:2) and loosely (6:4, 3:5 and 2:6) 
packed groups. While six OBs form a plane perpendicular to the hex- 
amer axis, the CITDs and NTD-As display marked axial variations 
(Fig. 1f, g). 


Inter-hexamer interface 


The head-to-head stacking of the two hexamers is largely mediated by 
their zinc-fingers (ZFs; B subdomain of NTD), as expected from 
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Figure 1 | Overall structure and domain organization of the MCM2-7 
double hexamer. a-c, Side-views of the cryo-EM density map superimposed 
with the atomic model. Unsharpened map (a) is displayed from the two-fold 
axis, and sharpened map (b, c) displayed with indicated rotations relative to 
a along the cylinder axis. The side panels of a and b illustrate the tilted 


previous studies*®***°, Notably, consistent with sequence analysis*® 
(Extended Data Fig. 2), the ZF of MCM3 is a degenerate version 
without zinc binding (Extended Data Fig. 3a—c). Twelve ZFs arrange 
into two stacked rings at the interface (Fig. 1h-l), with an apparent 
centre shift (Fig. 11 and Supplementary Video 1). Inter-ZF interac- 
tions are versatile, displaying completely different patterns at opposite 
sides of the wedged interface (Fig. 1b, subpanel). Although ZFs are 
more horizontally arranged at the thin 3-5-7 edge (Fig. 1h), they are 
nearly vertical at the thick 2-4-6 edge (Fig. 1j). ZF interactions are 
largely from their polar residues, dominated by two pairs of ZF5:ZF3' 
(Fig. 1h) and two pairs of ZF6:ZF2' (Fig. 1j), as measured in buried 
surfaces (Extended Data Table 1a). Different ZF orientations at the 
hexamer interface perfectly explain the observed tilt and twist 
between the two single hexamers, because this unique arrangement 
would enable comprehensive close contacts for both edges and leads 
to the stabilization of the double hexamer. 

Eukaryotic MCM proteins distinguish themselves from archaeal 
counterparts by many subunit-specific sequence extensions at their 
N and C termini (NTE and CTE, respectively) and insertions within 
functional domains. Both MCM4 and MCM6 have a very long linker 
between their OBs and CTDs (Extended Data Fig. 4a), which could be 
the underlying basis of the observed twist between NTD and CTD 
rings in the single hexamer (Fig. 1d). Notably, many sequence inser- 
tions and extensions also markedly contribute to the double hexamer 
stabilization (Fig. 2). For example, an insertion located on the B-turn 
of the OB from MCM6 (Extended Data Fig. 4h) interacts with the 
ZF of MCM2 on the other hexamer (Fig. 2e). The most unique 
inter-hexamer interactions involve MCM3, MCM5 and MCM~7. 
Compared with archaeal MCMs, they have longer sequences at their 
N termini (Extended Data Fig. 2), which form extended strands or 
loops (Extended Data Fig. 4e, g and i). MCM7 also has a long insertion 
(~70 residues) at its NID-A (N-terminal insertion (NTI)) (Extended 
Data Fig. 4a), folding into a helix-turn-helix motif (Extended Data 


MS’ M3’ M4’ M6’ 
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473 5 2 
Subunit 


M4’ M6’ 


arrangement of the two single hexamers. d, f, Top (d) and side (f) views of the 
organization of the indicated structural domains. Small coloured balls denote 
gravity centres (see Methods) of these domains. e, Distances between the 
gravity centres of adjacent CTDs. g, Radial projection of f. h-l, Side and top 
views of the segmented maps (sharpened) of ZFs at the hexamer interface. 


Fig. 4i). The N terminus of MCM5 extends into the space between ZFs 
of MCM3 and MCM7 from the other hexamer, and forms interac- 
tions with B-strands of both ZFs (Fig. 2f and Supplementary Video 2). 
Furthermore, the long NTI of MCM7 extends towards the opposite 
MCM5 and interacts with its NID-A (Fig. 2c, g). The N terminus of 
MCM7 also interacts with the N terminus of MCM3 from the other 
hexamer (Fig. 2h). On the basis of the calculated buried surfaces for 
the above interfaces (Fig. 2e-h), the contribution of NTIs and NTEs to 
the double hexamer stability is even greater than the ZF interactions 
(Extended Data Table 1a). Importantly, most insertions and exten- 
sions involved in inter-hexamer stabilization (Fig. 2) are conserved in 
higher eukaryotes, suggesting a universal importance of these eukar- 
yotic-specific sequences. 

MCM2, 4 and 6 have very long NTEs, which are targets of cellular 
signalling kinases’”*"'. These NTEs are highly disordered in our struc- 
ture, and their involvement in inter-hexamer interaction is unknown. 


Intersubunit interaction 


The intersubunit interactions are very similar, and can be categorized 
into three tiers based on their axial locations (Extended Data Fig. 5a-c). 
The first one is between two contacting CTDs (ATPase domains), 
largely composed of hydrophobic interactions, as exemplified by a 
tight stacking between two surface-exposed helices from two respect- 
ive CTDs (Extended Data Fig. 5a). The second tier, on the neck region 
of the hexamer, involves four conserved hairpins or loops from two 
adjacent subunits, including allosteric communication loop (ACL) 
and helix-2-insert (H2I) of the first subunit, and H2I and presensor 
1 B-hairpin (PS1-HP) of the flanking second subunit (Extended Data 
Fig. 5b). Atomic interactions at the neck interfaces are versatile, invol- 
ving different residues from these loops. However, a large proportion 
of them are polar residues, indicating electrostatic or hydrogen- 
bonding interactions dominate these interfaces. The third tier, 
contributed by the ZF of one subunit and two loops from the OB of 
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Figure 2 | Inter-hexamer interactions 
contributed by NTEs and NTIs. a-d, Side views of 
the MCM2-7 double hexamer, with indicated 
rotations around the cylinder axis. Atomic 
structure is superimposed with the unsharpened 
map. The sequence elements involved in inter- 
hexamer interactions are highlighted in blue and 
red representing single hexamer 1 (SH1) and 
SH2, respectively. e-h, Zoomed-in views of the 
boxed regions from b-d. Buried areas (A?) of these 
interfaces in e-h are labelled. BS, buried surface. 
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a flanking subunit (Extended Data Fig. 5c), is largely composed of 
hydrophobic interactions between respective B-strands or loops 
(for example, see Extended Data Fig. 6d). Perturbation of this inter- 
face by an MCM4 mutation (Phe391Ile) causes pre-replicative com- 
plex assembly defects in yeast and mammary carcinoma in mouse”. 
In addition, a mutation on MCM5 NTD-A (Phe83Leu) that results in 
Dbf4-dependent kinase (DDK)-independent activation** is close to 
this interface. 

Intersubunit interactions are further enhanced by the NTIs of 
MCM3, 5, 6 and 7, which contact the NTD-As of their neighbouring 
MCM7, 3, 2 and 4, respectively (Extended Data Fig. 5d-i). Compared 
with other pairs, the interface of MCM5-MCM2 is without NTI 
involvement, a feature that may facilitate the gap opening observed 
between them during hexamer loading and activation’®””’. At lower 
contour levels, four CTEs containing the winged-helix DNA binding 
motif could be identified for MCM4, 5, 6 and 7 (Extended Data 
Fig. 3d, e). The flexibility of these winged-helix-containing CTEs 
suggests that they are not involved in intersubunit interaction, con- 
trasting the role of winged-helix motifs in the origin recognition 
complex structure”. 

The buried surfaces of the six subunit interfaces are sharply dif- 
ferent (Extended Data Table 1a), with the smallest at 2:6, rather than 
at the gate-forming 2:5 interface. The weak 2:6 interface gives rise 
to a unique side channel (13 A), enough to accommodate single- 
stranded DNA (ssDNA) (Extended Data Fig. 7 and Supplementary 
Video 1), in contrast to archaeal MCM structure with six side chan- 
nels”. A ssDNA extrusion model has been proposed for the function 
of side channels in DNA unwinding”****. However, unwinding 
studies’*"*” generally conflict with this model. A definitive function 
for this unique 2:6 side channel remains to be examined. 


ATPase active centres 


Remarkable conformational differences lie at the six ATPase centres 
of the CTD ring. Comparisons of them indicate that two active cen- 
tres, 2:6 and 5:2, are apparent outliers. Their ATP-binding pockets are 
less compact, with sensor elements (sensors 2 and 3, and arginine 
finger) in MCM2 and MCM6, respectively, considerably shifted away 
from nucleotides, and the displacements are as large as 4-5 A mea- 
sured by Ca atoms of sensor 3 residues (Fig. 3a, b). Further analysis 
was done by comparisons of representative centres from the compact 
and loose groups with an active ATPase centre from papillomavirus 
El crystal structure or an inactive one from an archaeal MCM 
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structure*’. Indeed, while the conformational differences of the four 
compact centres relative to that of El are small (for example, dimers 
of 7:3 and 4:7; Extended Data Fig. 8h, i), the loose ones display sharp 
differences from that of El (Fig. 3c, d). Moreover, the nucleotide 
occupancies at the centres of 3:5 and 6:4 dimers are comparatively 
low (Extended Data Fig. 9), consistent with their reported nearly null 
ATPase activities*’. On the basis of the active centre arrangement and 
nucleotide occupancy, it appears that only dimers of 7:3 and 4:7 are 
active. This observation agrees with the extremely low ATPase activity 
observed in MCM2-7 as a double hexamer’’, and the reported activity 
of 7:3 dimer comparable to that of the whole MCM2-7 complex”. 


a. 5.1A ” oO soe 4.8A 
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AF 
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Figure 3 | Inactive ATPase centres of 2:6 and 5:2 dimers. a, b, Comparison of 
the ATP-binding pockets from 2:6 and 5:2 dimers with that from 7:3 dimer. 
Dimers (cis:trans) are presented with cis and trans ATPase elements on the left 
and right, respectively. c, d, Same as a and b, but the ATPase centres of 2:6 and 
5:2 dimers are compared with that of the hexameric E1 helicase**. Motifs of 
Walker A (WA), Walker B (WB), arginine finger (AF), sensor 2 and sensor 3 
are displayed in stick model. The directions and distances of sensor 3 
movements are marked. All alignments were done using the Walker A and B 
motifs as a reference. 
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Previously, individual active centres were proposed to have distinct 
roles in regulating helicase activities***'*. Supported by our data, 
allosteric regulation of these ATPase centres orchestrated by the 
orientation changes between adjacent CTDs, might be the basis for 
factor-dependent control of helicase activities during different rep- 
lication stages. 


Axial displacement of interior hairpin loops 


As in many hexameric AAA+ machineries, the central-pored cham- 
ber of MCM2-7 complex is decorated with layers of hairpin loops. For 
archaeal MCM, four layers of conserved loops essential for DNA 
binding and/or unwinding have been described**’, with two of them 
located innermost (Fig. 4a). The first one, composed of six H2Is, was 
previously shown to undergo axial movement depending on the nuc- 
leotide-binding states of the ATPase domains™. The other, composed 
of B-turn motifs of the six OBs, was shown to coordinate the 
binding of ssDNA to the MCM-ssDNA binding motifs on the chan- 
nel surface of OBs in the crystal structure of an archaeal MCM NTD 
homohexamer”. In the MCM2-7 complex, these two layers of loops 
are placed in axially staggered positions (Fig. 4b, c), and particularly, 
six H2Is roughly display a helical trajectory (Fig. 4c). An alignment 
with the ssDNA-bound archaeal MCM hexamer”’ precisely placed 
ssDNA between these two layers of loops (Fig. 4b, c). In addition, 
when a double-stranded DNA (dsDNA) is placed in the channel, 
the H2Is show very close contact with it, capable of inserting their 
terminal loops into its major or minor grooves consecutively (Fig. 4d). 
At a very low threshold, an extra piece of fragmented density, which 
might be the residual dsDNA, could be identified within the channel 
at the H2I ring, but its sub-stoichiometric occupancy prevented pos- 
itive identification and further analysis. Nevertheless, the snug fitting 
of the helically arranged H2Is and dsDNA suggests that H2I might be 


Side view, inner surface 


Figure 4 | Spatial distribution of conserved hairpins in the single hexamer. 
a, Bottom view (from the hexamer interface) of the MCM2-7 hexamer. The 
six MCM subunits are shown in alternating grey scales, with four hairpins 
highlighted in indicated colours. All four hairpins are at the subunit interface, 
and extend towards the adjacent subunit. b, c, Side views of the inner surface 
of the hexamer, docked with a model ssDNA from an archaeal MCM 
structure (PDB accession code 4POG)”’. d, Contacts between H2Is and dsDNA 
illustrated through fitting a dsDNA fragment into the central channel of 

the hexamer. 
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involved in the initial melting of origin DNA. These observations, 
together with repeated reports of the axial displacement of interior 
loops of AAA+ hexameric machines*”*’, suggest that the MCM2-7 
complex uses a conserved mechanism involving cycles of ATP bind- 
ing and hydrolysis to control the axial positions of the interior loops to 
facilitate DNA translocation and unwinding. 


Central channel and model of initial origin melting 


The diameter of the central channel in MCM2-7 single hexamer is not 
uniform (Fig. 5a, b), about 30 Aat the C-terminal end and 40 Aat the 
N-terminal end, but with two constriction sites (~25 A) at H2Is and 
B-turn motifs that are just wide enough to accommodate dsDNA 
(Fig. 5c, d). However, owing to the twisted stacking between two ZF 
rings in the double hexamer (Fig. 5f), the channel is partially blocked 
at the hexamer interface by the ZF rings, splitting the wide channel at 
the double hexamer interface into a main central channel and two 
minor channels (Fig. 5f and Supplementary Video 1). The overlap- 
ping central channel is just about the size of dsDNA (Fig. 5f, g), while 
the minor channels are not wide enough for the passage of dsDNA 
(Fig. 5h) but accessible from the outside. Notably, gate-forming sub- 
units MCM2 and MCM5 participate in the formation of both chan- 
nels. The overlapping central channel is delineated by ZFs from two 
MCM2-MCM5 dimers and a vertically arranged MCM6 dimer 
(Fig. 5g), and the minor channel involves ZFs from MCM2 and 5 of 
one single hexamer, and MCM3 and 7 of the other single hexamer 
(Fig. 5h). 

The structure of an already constricted central channel of the single 
hexamer that opens to a larger channel at the NTD only to be occluded 
by the offset of the ZF rings at the double hexamer interface invites 


Figure 5 | Central channel and its implication in origin melting. a, b, Cut- 
away views of the density map (unsharpened) with two dsDNA fragments fitted 
in the central channel. c-f, Surface representation (top view) of the atomic 
models of the six H2Is (c), B-turns (d), ZFs (e) from one single hexamer, and 
ZFs from both single hexamers (f). Red asterisks mark the positions of two 
minor channels. g, h, Double hexamer interface centred for the views of the 
narrowed central channel (g), and minor channel (h). ZFs not part of the 
constricted central channel are in high transparency. i, Model for initial origin 
melting (see text). 
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speculations for functions. First, the kink in the central channel 
created by the offset of the two single hexamer rings will probably 
cause deformation of trapped duplex DNA (Fig. 5i) to create a nuc- 
leation centre for DNA melting. Second, the tight grip of the duplex 
DNA on either ends by the helically positioned H2Is serves to hold the 
kinked DNA in place such that a slight left-handed rotation between 
the two single hexamers, as previously proposed’’, could further 
deform the origin DNA at the nucleation point. Third, possible rela- 
tive rotation between the NTD and CTD rings within single hexamers 
upon helicase activation, might further lower the activation energy of 
DNA melting. We envision that initial melting involves allosteric 
conformational changes, in combination with dsDNA translocation 
in opposite directions by the coupled single hexamers”. The dsDNA 
being pumped into the central channel provides the slack necessary 
for strand separation. This initial melting step requires the activation 
of the MCM2-7 helicase activity most likely by DDK phosphorylation 
and binding of Cdc45 and GINS”. Recent studies showed that 
DDK phosphorylation of the NTEs of MCM2-7 does not cause dou- 
ble hexamer separation'**°, but promotes MCM2-MCM5 gate open- 
ing*’. Opening of the MCM2-MCM5S gates at this point would merge 
central and minor channels, creating an expanded N-terminal cham- 
ber for strand separation. The ssDNA looping out through this cham- 
ber would be accessible to replication factors lurking nearby (Fig. 5i). 
Further strand separation towards the CTD ring may be facilitated by 
the interior B-hairpin loops and the MCM-ssDNA binding motifs” 
on the inner surface of OBs. 

This structure-informed hypothesis on the initial origin melting is 
in accordance with previous data. First, many factors required for 
helicase activation, such as Sld2, Sld3, Cdc45 and Mcm10, have 
well-defined ssDNA binding activity''’®****°°. Second, a similar 
replicative helicase SV40 large tumour antigen initiates origin melting 
as adsDNA pump*”*, and conformational rearrangements of the two 
single hexamers were observed during this process*’. Our structure 
suggests that, in addition to its role in processive fork unwinding, 
MCM2-7 is also actively involved in origin DNA melting. In transi- 
tioning from the initial origin melting state to the fork unwinding 
state, MCM2-7 essentially translocates first on dsDNA (dsDNA 
pump) and then along ssDNA (steric exclusion). 

In summary, the fine structural details provided in this work will 
serve as a rich source of information for designing and interpreting 
biochemical studies aimed at dissecting the mechanistic functions of 
the MCM2-7 complex. In particular, it will provide a framework for 
future study of the eukaryotic-specific assembly, activation and regu- 
lation of this helicase family. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 


Received 25 March 2015; accepted 25 June 2015. 
Published online 29 July 2015. 


1. O'Donnell, M., Langston, L. & Stillman, B. Principles and concepts of DNA 
replication in bacteria, archaea, and eukarya. Cold Spring Harb. Perspect. Biol. 5, 
a010108 (2013). 

2. Costa, A., Hood, |. V. & Berger, J. M. Mechanisms for initiating cellular DNA 
replication. Annu. Rev. Biochem. 82, 25-54 (2013). 

3. Duderstadt, K. E. & Berger, J. M. A structural framework for replication origin 
opening by AAA+ initiation factors. Curr. Opin. Struct. Biol. 23, 144-153 (2013). 

4. Tye, B. K. MCM proteins in DNA replication. Annu. Rev. Biochem. 68, 649-686 
(1999). 

5. Remus, D. et al. Concerted loading of Mcm2-7 double hexamers around DNA 
during DNA replication origin licensing. Cell 139, 719-730 (2009). 

6. Evrin, C. et al. A double-hexameric MCM2-7 complex is loaded onto origin DNA 
during licensing of eukaryotic DNA replication. Proc. Nat! Acad. Sci. USA 106, 
20240-20245 (2009). 

7. Siddiqui, K., On, K. F. & Diffley, J. F. Regulating DNA replication in eukarya. Cold 
Spring Harb. Perspect. Biol. 5, a012930 (2013). 

8. Heller, R. C. et al. Eukaryotic origin-dependent DNA replication in vitro reveals 
sequential action of DDK and S-CDK kinases. Ce// 146, 80-91 (2011). 

9. Yeeles, J. T., Deegan, T. D., Janska, A., Early, A. & Diffley, J. F. Regulated eukaryotic 
DNA replication origin firing with purified proteins. Nature 519, 431-435 (2015). 


190 | NATURE | VOL 524 | 13 AUGUST 2015 


. Brewster, A. S. et al. Crystal structure of a near-full-length archaeal 


. Tanaka, S. & Araki, H. Helicase activation and establishment of replication forks at 


chromosomal origins of replication. Cold Spring Harb. Perspect. Biol. 5, a01037 
(2013). 


. Tognetti, S., Riera, A. & Speck, C. Switch on the engine: how the eukaryotic 


replicative helicase MCM2-7 becomes activated. Chromosoma 124, 13-26 (2015). 
Ives, |., Petojevic, T., Pesavento, J. J. & Botchan, M. R. Activation of the MCM2-7 
helicase by association with Cdc45 and GINS proteins. Mol. Cell 37, 247-258 (2010). 


. Fu, Y. V. etal. Selective bypass of a lagging strand roadblock by the eukaryotic 


replicative DNA helicase. Cel/ 146, 931-941 (2011). 


14. Rothenberg, E., Trakselis, M.A., Bell, S. D. & Ha, T. MCM forked substrate specificity 


involves dynamic interaction with the 5’-tail. J. Biol. Chem. 282, 34229-34234 
(2007). 
cGeoch, A. T., Trakselis, M. A., Laskey, R. A. & Bell, S. D. Organization of the 
archaeal MCM complex on DNA and implications for the helicase mechanism. 
Nature Struct. Mol. Biol. 12, 756-762 (2005). 

Costa, A. et al. DNA binding polarity, dimerization, and ATPase ring remodeling in 
he CMG helicase of the eukaryotic replisome. eLife 3, e€03273 (2014). 

Graham, B. W., Schauer, G. D., Leuba, S. H. & Trakselis, M. A. Steric exclusion 
and wrapping of the excluded DNA strand occurs along discrete external 
binding paths during MCM helicase unwinding. Nucleic Acids Res. 39, 
6585-6595 (2011). 


. Sun, J. et al. Structural and mechanistic insights into Mcm2-7 double-hexamer 


assembly and function. Genes Dev. 28, 2291-2303 (2014). 


. Samel, S.A. et a/. A unique DNA entry gate serves for regulated loading of the 


eukaryotic replicative helicase MCM2-7 onto DNA. Genes Dev. 28, 1653-1666 


(2014). 


. Sun, J. etal. Cryo-EM structure of a helicase loading intermediate containing ORC- 


Cdc6-Cdt1-MCM2-7 bound to DNA. Nature Struct. Mol. Biol. 20, 944-951 (2013). 


. Costa, A. et al. The structural basis for MCM2-7 helicase activation by GINS and 


Cdc45. Nature Struct. Mol. Biol. 18, 471-477 (2011). 


. Hesketh, E.L. etal, DNAinduces conformational changes ina recombinant human 


minichromosome maintenance complex. J. Biol. Chem. 290, 7973-7979 (2015). 
CM: 
functional insights for an AAA+ hexameric helicase. Proc. Nat! Acad. Sci. USA 105, 
20191-20196 (2008). 


. Bae, B. et al. Insights into the architecture of the replicative helicase from the 


structure of an archaeal MCM homolog. Structure 17, 211-222 (2009). 


. Slaymaker, |. M. et a/. Mini-chromosome maintenance complexes form a filament 


to remodel DNA structure and topology. Nucleic Acids Res. 41, 3446-3456 (2013). 


. Fletcher, R. J. et a/. The structure and function of MCM from archaeal M. 


thermoautotrophicum. Nature Struct. Biol. 10, 160-167 (2003). 


. Froelich, C. A. Kang, S., Epling, L. B., Bell, S. P. & Enemark, E. J. A conserved MCM 


single-stranded DNA binding element is essential for replication initiation. eLife 3, 
e01993 (2014). 


. Fu, Y., Slaymaker, |. M., Wang, J., Wang, G. & Chen, X. S. The 18-A crystal structure 


of the N-terminal domain of an archaeal MCM as a right-handed filament. 
J. Mol. Biol. 426, 1512-1523 (2014). 


. Liu, W., Pucci, B., Rossi, M., Pisani, F. M. & Ladenstein, R. Structural analysis of the 


Sulfolobus solfataricus MCM protein N-terminal domain. Nucleic Acids Res. 36, 
3235-3243 (2008). 


. Miller, J. M., Arachea, B. T., Epling, L. B. & Enemark, E. J. Analysis of the crystal 


structure of an active MCM hexamer. eLife 3, €03433 (2014). 


. Cuesta, |. et al. Conformational rearrangements of SV40 large T antigen during 


early replication events. J. Mol. Biol. 397, 1276-1286 (2010). 


. Vijayraghavan, S. & Schwacha, A. The eukaryotic Mcm2-7 replicative helicase. 


Subcell. Biochem. 62, 113-134 (2012). 


. Bochman,M.L., Bell, S. P.& Schwacha, A. Subunit organization of Mcm2-7 and the 


unequal role of active sites in ATP hydrolysis and viability. Mol. Cell. Biol. 28, 
5865-5873 (2008). 


. Evrin, C. etal. The ORC/Cdc6/MCM2-7 complex facilitates MCM2-7 dimerization 


during prereplicative complex formation. Nucleic Acids Res. 42, 2257-2269 
(2014). 


. Slaymaker, |.M. & Chen, X.S. MCM structure and mechanics: what we have learned 


from archaeal MCM. Subcell. Biochem. 62, 89-111 (2012). 


. Bochman, M.L.& Schwacha, A. The Mcm complex: unwinding the mechanism ofa 


replicative helicase. Microbiol. Mol. Biol. Rev. 73, 652-683 (2009). 


. Shima, N. et a/. A viable allele of Mcm4 causes chromosome instability and 


mammary adenocarcinomas in mice. Nature Genet. 39, 93-98 (2007). 


. Hardy, C. F., Dryga, O., Seematter, S., Pahl, P. M. & Sclafani, R. A. mcm5/cdc46- 
b 


ob 1 bypasses the requirement for the S phase activator Cdc7p. Proc. Natl Acad. 
Sci. USA 94, 3151-3155 (1997). 


. Bleichert, F., Botchan, M. R. & Berger, J. M. Crystal structure of the eukaryotic 


origin recognition complex. Nature 519, 321-326 (2015). 


. Enemark, E. J. & Joshua-Tor, L. Mechanism of DNA translocation in a replicative 


hexameric helicase. Nature 442, 270-275 (2006). 


. Kang, S., Warner, M. D. & Bell, S. P. Multiple functions for Mcm2-7 ATPase motifs 


during replication initiation. Mol. Cell 55, 655-665 (2014). 


. Coster, G., Frigola, J., Beuron, F., Morris, E. P. & Diffley, J. F. Origin licensing requires 


ATP binding and hydrolysis by the MCM replicative helicase. Mol. Cel/ 55, 
666-677 (2014). 


. Bell, S. D. & Botchan, M. R. The minichromosome maintenance replicative 


helicase. Cold Spring Harb. Perspect. Biol. 5,a012807 (2013). 


. Jenkinson, E. R. & Chong, J. P. Minichromosome maintenance helicase activity is 


controlled by N- and C-terminal motifs and requires the ATPase domain helix-2 
insert. Proc. Natl Acad. Sci. USA 103, 7613-7618 (2006). 


©2015 Macmillan Publishers Limited. All rights reserved 


45. 


46. 


47. 


48. 


49. 


50. 


Gai, D., Zhao, R., Li, D., Finkielstein, C. V. & Chen, X.S. Mechanisms of conformational 


change 


for a replicative hexameric helicase of SV40 large tumor antigen. Ce// 119, 


47-60 (2004). 
On, K. F. et al. Prereplicative complexes assembled in vitro support origin- 


depend 
Bruck, 
allow fol 
1210-1 
Bruck, 


ent and independent DNA replication. EMBO J. 33, 605-620 (2014). 

.& Kaplan, D. L. The Dbf4-Cdc7 kinase promotes Mcm2-7 ring opening to 
r single-stranded DNA extrusion and helicase assembly. J. Biol. Chem. 290, 
221 (2015). 

. & Kaplan, D. L. Cdc45 protein-single-stranded DNA interaction is 


important for stalling the helicase during replication stress. J. Biol. Chem. 288, 
7550-7563 (2013). 


Fien, K. 
interac 


et al. Primer utilization by DNA polymerase «-primase is influenced by its 
ion with Mcm10p. J. Biol. Chem. 279, 16144-16153 (2004). 


Eisenberg, S., Korza, G., Carson, J., Liachko, |. & Tye, B. K. Novel DNA binding 


propert 


ies of the Mcm10 protein from Saccharomyces cerevisiae. J. Biol. Chem. 


284, 25412-25420 (2009). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank X. Li for providing programs in data collection, motion 
correction and framed-based analysis, and J. Wang for advices on modelling and model 


ARTICLE 


refinement. We also thank the National Center for Protein Sciences (Beijing, China) for 
technical support with cryo-EM data collection and for computation resource. This work 
was supported by the Ministry of Science and Technology of China (2013CB910404 to 


N.G.), the National Natural Science Foundation of C! 


hina (31422016 to N.G.), the 


Research Grants Council of Hong Kong (GRF664013 and HKUST12/CRF/13G to Yu.Z.) 


and the Hong Kong University of Science & Techno 


Author Contributions Yu.Z. purified sample; N.L.co 


logy (B.-K.T.). 


lected cryo-EM data (with J.L,, Yi.Z. 


and W.L), performed image processing, and analyzed structures. N.L., N.G. and M.Y. 


performed atomic modelling. N.L., Yu.Z., B-K.T. and 


interpreted the structure and wrote the manuscript. 


Author Information The cryo-EM density map has 
Microscopy Data Bank (EMDB) under accession nu 


N.G. designed experiments, 


been deposited in the Electron 
mber EMD-6338; and the atomic 


model has been deposited in the Protein Data Bank (PDB) under accession number 


3JA8. Reprints and permissions information is avai 


able at www.nature.com/reprints. 


The authors declare no competing financial interests. Readers are welcome to 
comment on the online version of the paper. Correspondence and requests for 
materials should be addressed to N.G. (ninggao@tsinghua.edu.cn), B.-K.T. 


(bt1 6@cornell.edu), or Yu.Z. (zhai@ust.hk). 


13 AUGUST 2015 | 


©2015 Macmillan Publishers Limited. All rights reserved 


VOL 524 | NATURE | 191 


ARTICLE 


METHODS 


No statistical methods were used to predetermine sample size. 

Yeast strain. One-step PCR-based approach” with pTF272 (pFA6a-TEV- 
6x Gly-3 x Flag-HphMX, Addgene) as DNA template was used to generate 
MCM4-TEV-3 Flag tagging modification in the W303-1a background strain. 
The resulting strain showed no growth defect compared to its parent W303-la 
strain. 

Sample purification. Forty litres of log-phase G1 yeast cells (3 X 107-4 X 10” 
cells per ml) were collected and processed for spheroplasting to isolate crude 
chromatin as described previously with the following modifications for a 
large-scale preparation. Spheroplasting was performed in 200 ml of spheroplast- 
ing buffer containing sufficient amount of lyticase that was purified from an 
Escherichia coli strain bearing lyticase expressing plasmid pUV5-GI1S (gift from 
S. Gasser). The spheroplasts were lysed with extraction buffer EBX (50mM 
HEPES/KOH, pH 7.5, 100 mM K-glutamate, 10 mM magnesium acetate, 0.25% 
Triton X-100, 3mM ATP, 1 mM dithiothreitol (DTT), 1 mM EDTA, 2 mM NaF, 
1 mM NaVO,, 1 mM phenylmethanesulfonylfluoride (PMSF), 2 1g ml pepsta- 
tin A and 1X protease inhibitor cocktail (Roche)). The lysate was layered onto the 
top of equal volume of EBX buffer containing 30% sucrose and centrifuged at 
25,000g (Hitachi R20A2) for 15 min. To solubilize chromatin fractions, the crude 
chromatin was digested in 40 ml of freshly made benzonase buffer (50 mM 
HEPES/KOH, pH 7.5, 100mM K-glutamate, 8mM MgCh, 0.02% NP-40, 
3mM ATP, 1mM EDTA, 2mM NaF, 1mM NaVO,, 1mM PMSF, 2 pig ml ! 
pepstatin A and 1X protease inhibitor cocktail (Roche)) with 1 U ul”? of benzo- 
nase (71206-3; Merck Biosciences) for 10 min at 37 °C, and then 1h on ice. The 
suspension was then centrifuged for 20 min at 25,000g. The clear phase was 
recovered, and subjected to anti-Flag immunoprecipitation with 1 ml bed volume 
of washed anti-Flag M2 agarose (Sigma) at 4 °C for 2 h. Beads were recovered, and 
washed extensively with benzonase buffer and then tobacco etch virus (TEV) 
buffer (50 mM HEPES/KOH, pH 7.5, 100 mM K-glutamate, 8mM MgCl,, 0.02% 
NP-40, 3mM ATP). MCM2-7 complexes were cleaved from the M2 agarose by 
incubation for overnight at 4 °C in TEV buffer with 100 U ml‘ of AcTEV prote- 
ase (Life Technology). His-tagged TEV protease was removed by incubating the 
eluate with a TALON metal affinity resin (Clontech) for 30min at 4°C. The 
MCM2-7 complexes were then applied on the top of 20-40% glycerol gradient 
in buffer EBX with protease inhibitors. The glycerol gradient was centrifuged in a 
TLS-55 rotor (Beckman Optima TLX ultracentrifuge) at 175,000g for 6.5 h. The 
fractions were collected from the top of the gradient after centrifugation. The 
fractions containing the MCM2-7 double hexamers were pooled and processed 
for electron microscopy analysis. 

Electron microscopy. The MCM2-7 double hexamer was concentrated by ultra- 
filtration to remove glycerol. Negative staining of the MCM2-7 double hexamer 
was performed with 2% uranyl acetate. Grids were examined using an FEI T12 
microscope operated at 120kV, and images were recorded using a 4k X 4k 
charge-coupled device (CCD) camera (UltraScan 4000, Gatan). 

For cryo-grid preparation, 4 1 aliquots of samples were applied to a glow- 
discharged holy carbon grid (Quantifoil R2/2) coated with a thin layer of freshly 
prepared carbon, and cryo-freezing was performed with an FEI Vitrobot Mark IV 
(4°C and 100% humidity). Grids were examined using an FEI Titan Krios oper- 
ated at 300kV, and images were recorded using a K2 Summit direct electron 
detector (Gatan) in counting mode, at a nominal magnification of 22,500X, 
which renders a final pixel size of 1.32 A at object scale after post-magnification 
calibration, and with the defocus ranging from —1.5 to —2.5 um. Images were 
collected under low-dose condition in a semi-automatic manner using UCSF- 
Image4 (written by X. Li and Y. Cheng). For each micrograph stack, a total of 32 
frames were collected, with a dose rate of ~8.2 counts (~10.9 electrons) per 
physical pixel per second for an exposure time of 8 s. 

Image processing. Initial 3D model from negatively stained particles was 
calculated using RELION* using a density cylinder as reference. For cryo- 
EM data, beam-induced motion correction at micrograph level was performed 
as previously described (written by X. Li)**. Micrographs screening, automatic 
particle picking and normalization were done with SPIDER”. Program of 
CTFFIND3 (ref. 56) was used to estimate the contrast transfer function para- 
meters. The 2D, 3D classification and refinement were performed with 
RELION. A total of 347,801 particles (with a binning factor of two) from 
2,230 micrographs were subjected to a cascade of 2D and 3D classification. 
Analysis of classification structures indicated that there is a C2-axis perpen- 
dicular to the cylinder axis of the MCM2-7 double hexamer, reflecting a 
symmetric arrangement of one single hexamer relatively to the other single 
hexamer by a simple 180° rotation. A final structurally homogeneous data set 
composed of 85,365 particles, as classification structures of them have reached 
to considerably higher resolution, in full window size (300 X 300) were used 
for high-resolution refinement with C2-symmetry imposed. From the orienta- 


tion distribution (Extended Data Fig. 1h, i), there is a wide equator belt with a 
complete distribution of particles, along with two regions with relatively more 
particles. Nevertheless, this type of uneven distribution did not affect our final 
reconstruction, as particles from the equator belt have provided sufficient 
information for a complete sampling of the central slices in the Fourier space. 
Symmetry-free refinement was also performed, resulting in generally similar 
but slightly worse density maps. To improve the resolution further, different 
combinations of movie frames were used for motion correction and frame 
averaging. The first two frames had large motions, therefore, frames 3-16 were 
used to sum micrographs. To reduce interpolation errors, particles were 
rewindowed by offsetting translation parameters determined in the 3D refine- 
ment of last round, which improved the resolution to 4.6A (gold-standard 
FSC 0.143 criteria). The final round of refinement was performed with a 
soft-edged mask applied, resulting in a 4.3-A map. After correction for the 
modulation transfer function of K2 detector, and map sharpening using post- 
processing options of RELION with a B-factor of —100 A’, the overall reso- 
lution of the final density map within the region defined by the soft mask is 
3.8 A for the overall map (Extended Data Fig. 1k), after correction of the effect 
of soft mask on the FSC curve*’. Local resolution map was estimated using 
blocres in Bsoft**. From the local resolution map, peripheral regions are asso- 
ciated with worst resolution, while the core region is better than 3.5 A. The 
statistics of the data collection and structural refinement is provided in 
Extended Data Table 1b. 

Model building. Six monomers of the crystal structure of a chimaeric archaeal 
MCM (PDB code 4R7Y)* (Sulfolobus solfataricus NTD fused with Pyrococcus 
furiosus CTD) hexamer were manually docked to the density map of the MCM 
double hexamer using Chimera”. The docking also confirmed the handedness of 
the density map. The rigid-body docking was performed by dividing the crystal 
structure of the monomer into four pieces (NTD-A, OB-fold, ZF and CTD) 
(Extended Data Fig. 4b, c). Sequence alignments of the yeast MCM proteins with 
crystal template were initially performed using BLAST® and manually adjusted 
according to the secondary structure prediction of these sequences (PSIPRED)"". 
The predicted secondary structural information of the eukaryotic subunit-spe- 
cific sequences was used to assign the six MCM proteins into the cryo-EM density 
map. Initial atomic coordinates of the OB-fold subdomains and CTDs of MCM2- 
7 proteins were then generated using CHAINSAW” in the CCP4 suite®. Models 
were manually adjusted and built in Coot™. Only minor changes were required 
for modelling the OB-fold subdomains and CTDs of the yeast MCM proteins 
owing to their high sequence identity to the template (Extended Data Fig. 2). The 
NTD-As of the yeast MCM proteins contain many sequence insertions, and the 
modelling of these sequences was similar to that described above, but involved 
multiple rounds of realignment of sequences and largely facilitated by the pre- 
dicted secondary structure. In many cases, the modelling of NID-A required 
complete retracing of the main-chain based solely on densities and secondary 
structural predictions. For regions independent of known template (eukaryotic- 
specific sequences, Extended Data Fig. 4), poly-alanine models were built first 
using Coot. Clearly resolved bulky residues (Phe, Tyr, Trp and Arg) were then 
used as markers to assign the primary sequences. As a result, we derived an atomic 
model of the MCM2-7 double hexamer, for ~80% of its sequences, from a near- 
atomic cryo-EM density map, integrated with structural information from other 
sources. Further model refinement was done by alternating rounds of model 
rebuilding in Coot and real-space refinement (phenix.real_space_refine)® in 
Phenix®, with secondary structure and stereochemical constraints applied. 
Similar to a previous cryo-EM work with comparable 3.8-A resolution”, during 
the real-space refinement, knowledge-based restraints, including Ramachandran 
potentials and rotamer correction, were applied to ensure a proper balance 
between density-fitting and stereochemical and rotamer distributions. Owing 
to the resolution limitation, local densities at the ATP-binding sites could not 
unambiguously distinguish between ATP and ADP. For modelling purpose, ADP 
was docked to the active centres and similarly refined in Phenix. The atomic 
model was cross-validated according to previously described procedures”. 
Specifically, the coordinates of the final model were randomly displaced by 
0.2. A using the PDB tools of Phenix. The displaced model was refined against 
the Halfl map (produced from a half set of all particles during refinement by 
RELION). The refined model from Halfl map was compared with the maps of 
Half1, Half2 in Fourier space to produce two FSC curves, FSCwork (model versus 
Halfl map) and FSC;,.. (model versus Half2 map), respectively (Extended Data 
Fig. 11). Another FSC curve between the refined model from Half1 and the final 
density map (model versus merge) from all particles was also produced. As 
indicated by these curves, the agreement between FSCyork and FSCfree (no large 
separation) indicated that the model was not overfitted. MolProbity” (http:// 
molprobity.biochem.duke.edu/) was used to evaluate the final model, and final 
statistics of the model was provided in Extended Data Table 1b. Notably, applica- 
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tion of knowledge-based restraints during the real-space refinement has 
improved the stereochemical and rotamer statistics of the model. Comparisons 
of the representative density with the atomic model for selected areas are shown in 
Extended Data Fig. 6 and Supplementary Video 2. 

Gravity centres of individual domains (Fig. 1) were determined with seg- 
mented maps of the conserved core regions of these domains (minus variable 
loops and linkers) using SPIDER. To determine the cylinder axis of the hex- 
amer, a plane perpendicular to the axis was determined by least-square fitting 
of six centres of the OBs in Chimera. Pymol’’ and Chimera were used for 
structural analysis and figure preparation. Interface areas of the intersubunit 
and inter-hexamer interactions were calculated by PISA”, and provided in 
Extended Data Table la. 
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Extended Data Figure 1 | MCM2-7 double hexamer purification and 
structural determination. a, A flowchart of the procedure for MCM2-7 
double hexamer purification from G1 chromatin of the yeast strain MCM4- 
TEV-3 Flag. b, Fractions taken were analysed by SDS-PAGE and 
immunoblotting of the indicated MCM subunits. c, The eluted MCM2-7 
complexes were subjected to 20-40% glycerol gradient sedimentation at 
175,000g for 6.5 h. Collected fractions were analysed by SDS-PAGE and 
visualized by silver staining. Molecular size markers used are: ALP 140 kDa and 
thyroglobulin 670 kDa. Fractions 10-12 were pooled and concentrated for 
cryo-EM analysis. d, A representative raw micrograph of the negatively stained 
MCM2-7 double hexamer. Representative 2D class averages of negatively 
stained particles produced by reference-free classification are shown at the top- 
right corner. The initial 3D model generated using RELION is shown at the 
bottom-right corner. e, A representative raw micrograph of cryo-EM data. 

f, Representative 2D class averages of cryo-EM particles from reference-free 
classification. g, Two typical side views of the average images from f, in enlarged 
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forms, highlighting well-resolved secondary structure elements. Extra densities 
with poor quality on the two ends of double hexamer could be attributed to the 
flexible winged-helix motifs (WH) within the CTEs of MCM proteins. 

h, i, Distribution of particle orientations in the last round of structural 
refinement, showing in side (h) and top (i) views. The heights of blue cylinders 
at different projection directions on the surface of a hemisphere are 
proportional to their particle numbers. Two areas (red asterisks) of a dense 
equator belt are slightly enriched with particles. j, The density map of the 
MCM2-7 double hexamer (sharpened) is shown in two views, for the outer 
(left) and inner (right) surfaces. The map is colour-coded to indicate the range 
of the local resolution. k, Fourier shell correlation (FSC) curves for the final 3D 
density map after RELION-based post-processing (red, gold-standard FSC), 
and for the cross-examination between final atomic model and the 3D density 
map (blue, final refined model versus map). At a FSC 0.143 cut-off, the overall 
resolution for the map is 3.8 A. 1, FSC curves for the atomic model cross- 
validation. See Methods for details. 
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Extended Data Figure 2 | Sequence alignment of the MCM2-7 proteins. 
The sequences of archaeal (SS, Sulfolobus solfataricus) and yeast 
(Saccharomyces cerevisiae) MCM proteins were aligned using BioEdit’’. The 
alignment was further adjusted manually according to the secondary structure 
prediction and 3D structural alignment. Conserved hairpins and loops are 
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labelled (H2I, EXT, PS1 and B-turn). CTEs were aligned by the predicted 
secondary elements in the winged-helix (WH) motifs. Eukaryote-specific 
sequences (numbered 1-9) well resolved in our structure as in Extended Data 
Fig. 4a are labelled. 
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Extended Data Figure 3 | Structural diversity of ZFs and structural 
flexibility of CTEs in the MCM2-7 double hexamer. a-c, Ribbon representa- 
tion of ZF motifs of MCM7 (a), MCM3 (b) and MCM5 (c), superimposed with 
sharpened density map (transparent cyan) at 40 contour level. Positions of 
zinc are denoted by red balls. Zinc-binding was not observed in the ZF of 
MCM3. d, e, Surface representation of the density map (unsharpened, at 1o 


contour level) superimposed with colour-coded atomic structures for each 
MCM subunit, viewed from the CTD ring. Four segmented extra densities are 
coloured in deep grey (d), with tentative fitting of a winged-helix motif from 
a crystal structure (PDB code 2KLQ)” into these four density pieces. e, Same 
as d, but displayed without extra densities. 
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Extended Data Figure 4 | Subunit-specific structural features of the b, c, A protomer of the crystal structure of a chimaeric archaeal MCM hexamer 
MCM2-7 subunits. a, Schematic illustration of domain organization and structure (PDB code 4R7Y)” used as the template for modelling. The archaeal 
subunit-specific features of MCM2-7 subunits, with comparison to the MCM was aligned globally (b) or domain-based flexibly fitted (c) to the atomic 
archaeal MCM (SS, Sulfolobus solfataricus) (see also Extended Data Fig. 2). model of MCM2. d-i, Side-by-side structural comparison of MCM2-7 
Numbered regions correspond to numbered extensions and insertions proteins, with MCM3-7 globally aligned to the atomic model of MCM2. The 
highlighted in d-i. ~’ symbols denote corresponding regions with reliable well-resolved insertions and extensions of each MCM subunit (d-i) are 
densities to trace the main chain direction, but not sufficient for atomic numbered and coloured in red. 


modelling. ‘--’ symbols denote sequences with highly disordered densities. 
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Extended Data Figure 5 | Intersubunit interactions in the MCM2-7 single 
hexamer. a, Interactions at the CTD ring exemplified by the 7:3 interface. The 
EXT hairpin of MCM3 facilitates the packing of one helix (the o-linker of 
the o/8 subdomain) from MCM7 with another helix (located at the a 
subdomain of the CTD) from MCM3. b, Interactions at the neck region, as 
exemplified by the 6:4 interface. PS1-HP of MCM¢4 is sandwiched between ACL 
and H2I-N (N-terminal loop/helix of H2I) of MCM6. At the same time, ACL of 
MCM6 also interacts with H2I-C (C-terminal helix of H2I) of MCM4. 

c, Interactions at the NTD ring exemplified by the 6:4 interface. The first loop of 
OB (OB-L1) that flanks NTD-A, and the extended f-turn loop from MCM6 


M2<>M6 


form a cradle for docking the ZF from MCM4. Asterisks mark sites of strong 
interactions. d-i, Zoomed-in views of intersubunit interactions between NTD- 
As of each adjacent MCM pair. The unsharpened density map (transparent 
grey), contoured at the 2.7¢ level, is superimposed with the atomic model. Four 
of the six MCM proteins (3, 5, 6 and 7) contain NTIs at varying locations of 
their NTD-As (see also Extended Data Fig. 4). Only the NTI of MCM7 is 
modelled in our structure. Superimposition of the atomic model with the 
density map indicates that these NTIs all interact with the NTD-As of the 
adjacent subunit on the left. Extra densities indicating interactions are marked 
by red asterisks. 
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Extended Data Figure 6 | Cryo-EM densities for different regions of the interaction, highlighting the interactions between the B-strands of MCM5- 
MCM2-7 double hexamer. a, Electron microscopy density map (cyan mesh) NTE and MCM7-ZEF. d, A representative region of intersubunit interaction 
superimposed with atomic model for NID-A of MCM7. Two representative | (MCM4-MCM6), highlighting the hydrophobic interaction between Met342, 
a-helices with side chains (right) were displayed in stick representation. Phe391 of MCM4 and neighbouring Ile284 of MCM6. e, A representative 

b, Electron microscopy density map for the OB and ZF of MCM3. region of conserved hairpin loops, highlighting H2I of MCM4. Segmented 

A representative loop of the OB and a strand connecting the OB and ZF with density maps in all panels are displayed at the 5-60 contour level. 

side chains are shown on the right. c, A representative region of inter-hexamer 
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Extended Data Figure 7 | A unique side channel between MCM2 and from the central channel during DNA unwinding along with basic residues 
MCM6. a-f, Outer surface representation of the six subunit interfaces within  (Arg566, Lys557 and Lys564) of the EXT hairpin from MCM6. The H2I-N is 
MCM2-7 single hexamer. a, A unique side channel in the neck region of the _ partially disordered. b-f, Same as a, but at different subunit interfaces. In 
M2-M6 interface. The boxed region is shown in a zoomed-in view (right) with __ the case of the 3:5 interface (e), the N-C linker of MCM3 also contributes to 
individual components (H2I-N, EXT, PS1 and ACL) coloured individually. the blocking of the channel. 

The size of this side channel is large enough to act as a pore for ssDNA exiting 
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Extended Data Figure 8 | ATP-binding site configuration at MCM 
intersubunit interfaces. a—f, Zoomed-in views of ATP-binding sites for each 
MCM dimer. The Walker A and B (WA and WB, respectively) residues of 
the left subunit, and sensor 3, sensor 2 and arginine finger (AF) residues of the 
right subunit, are shown in stick model. g, Superimposition of all six active 
centres. The sensor 3 residues of MCM2 (orange asterisk) and MCM6 (blue 
asterisk) in the 5:2 and 2:6 dimers display sharply different configurations, 


ARTICLE 


resulting in two relatively loose centres. h, i, Superimposition of two 
representative compact ATPase centres (dimers of 7:3 and 4:7) with that of 
E1 hexameric helicase (active form)**. j, The ATPase centre (inactive 
conformation) of an archaeal MCM (PDB code 4R7Y)*. k, 1, Superimposition 
of j with the centres of 2:6 (k) and 7:3 (1). Walker A and B motifs are used as a 
reference for alignment in all panels. 1, A large shift in the sensor 3 of 
MCM3 is shown by red arrow, compared with the inactive conformation. 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Sensor 3 
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Extended Data Figure 9 | Nucleotide occupancy at the six ATPase centresof superimposed (transparent grey). Note that nucleotide occupancies at the 
MCM2-7 single hexamer. a-f, Zoomed-in views of the active centres for all _ centres of 6:4 and 3:5 are relatively low. For the 7:3 dimer, there seems to be 
MCM subunit pairs. The conserved ATPase elements of the active centres extra density for y-phosphate or Mg’*, but could not be confirmed at the 
are labelled. Segmented nucleotide densities at a contour level of 5.50 were current resolution (3.8 A). Nucleotides were modelled using ADP. 
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Extended Data Table 1 | Statistics of structural determination, model refinement and interface analysis. 


a 


Buried surface areas for Inter-subunit interactions 


Ratio and number of 


Buried surface Buried surface (A2) 
Subunits residues involved 
Total (A?) 
Subunit 1 Subunit 2 Tier 1 Tier 2 Tier 3 Other (N/N, N/ZF, ZF/ZF) 
M6/M4 4028 11.5%/114 11.1%/121 1011 1129 1800 230 
M4/M7 4122 11.7%/111 10.4%/117 1611 994 1045 591 
M7/M3 3825 9.8%/107 11.1%/123 1434 899 893 702 
M3/M5 3884 11.4%/107 10.7%/113 1610 1198 994 271 
M5/M2 3587 10.1%/103 10.3%/108 1397 1282 999 59 
M2/M6 2886 8.6%/78 7.9%/96 1104 869 943 8 
Buried surface areas for Inter-hexamer interactions (A2) 
Total 5:777:5° — 3:5'7/5:3’ —.2:67/6:2’—3:7'/7:3" 3:3’ 6:6 5:5’ 4:51/5:4 = 2:4'/4:2’ 
Total 6387 3284 979 847 625 270 114 102 135 30 
ZF:ZF’ 981 442 293 114 102 30 
N:N’ 2272 1487 538 111 136 
N:ZF°/ZF:N’ 2531 1748 537 87 159 
B-turn/B-turn’ 764 210 554 
Buried surface at the hexamer interface contributed by each subunit 
Subunit M2 M3 M4 M5 M6 M7 
Buried surface (A?) 877 1874 166 4500 961 3909 
b 
Data collection 
Electron microscope Titan Krios 
Voltage (kV) 300 
Electron detector K2 camera 


Electron dose (e/A2) 
Pixel size (A) 
3D Reconstruction 
Particles for final refinement 
Resolution of unmasked map (A) 
Resolution of masked map (A) 
Map sharpening B-factor (A2) 
Model composition 
Peptide chains 
Residues 
Ligands (ADP) 
R.m.s. deviations 
Bonds length (A) 
Bonds angles (°) 
Ramachandran plot 
Favored (%) 
Outliers (%) 
Validation 
Molprobity score 
Rotamer outliers (%) 


50 (32 frames)/22 (frame 3-16) 


1.32 


85,366 
4.3 

3.8 
-100 


91.0 
1.1 


2.45 
0.06 


a, Calculated surface areas of intersubunit and inter-hexamer interfaces. The calculation was done using PISA’. At the hexamer interface, there are 25 and 10 residues not built (owing to the structural disorder) for 
the ZF of MCM5 and the B-turn loop of MCM6, respectively. Therefore, the actual contribution of the MCM5-ZF and MCM6-f-turn to the interhexamer interaction could be much larger. b, Statistics of data 


processing and model refinement. 


©2015 Macmillan Publishers Limited. All rights reserved 


Bod i 


doi:10.1038/nature14616 


A giant protogalactic disk linked to the cosmic web 


D. Christopher Martin’, Mateusz Matuszewski!, Patrick Morrissey’, James D. Neill’, Anna Moore’, Sebastiano Cantalupo’, 


J. Xavier Prochaska & Daphne Changt 


The specifics of how galaxies form from, and are fuelled by, gas 
from the intergalactic medium remain uncertain. Hydrodynamic 
simulations suggest that ‘cold accretion flows’—relatively cool 
(temperatures of the order of 10‘ kelvin), unshocked gas streaming 
along filaments of the cosmic web into dark-matter halos’ *—are 
important. These flows are thought to deposit gas and angular 
momentum into the circumgalactic medium, creating disk- or 
ring-like structures that eventually coalesce into galaxies that form 
at filamentary intersections**. Recently, a large and luminous fila- 
ment, consistent with such a cold accretion flow, was discovered 
near the quasi-stellar object QSO UM287 at redshift 2.279 using 
narrow-band imaging®. Unfortunately, imaging is not sufficient to 
constrain the physical characteristics of the filament, to determine 
its kinematics, to explain how it is linked to nearby sources, or to 
account for its unusual brightness, more than a factor of ten above 
what is expected for a filament. Here we report a two-dimensional 
spectroscopic investigation of the emitting structure. We find that 
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Figure 1 | Spectral image and pseudo-slit spectrum of the QSO UM287 
field. a, Spectral image of the pseudo-slit (gridded region) in b, displayed in 
velocity with respect to the systemic velocity and arcseconds (AQ) with respect 
to the reference (QSO source A, which is QSO UM287) position (z = 2.279). 
Flux density J; is measured in kilo line units per A, where 1 kLU A != 1,000 


photoncm *s ' sr! or 1.1810! ergcm 7s" arcsec * The signal-to- 


the brightest emission region is an extended rotating hydrogen 
disk with a velocity profile that is characteristic of gas in a dark- 
matter halo with a mass of 10’* solar masses. This giant protoga- 
lactic disk appears to be connected to a quiescent filament that may 
extend beyond the virial radius of the halo. The geometry is 
strongly suggestive of a cold accretion flow. 

We observed the UM287 filament with the Palomar Cosmic Web 
Imager (PCWI)’, an integral field spectrograph that is designed for low 
surface brightness measurements using a 40” X 60” reflective image 
slicer with twenty-four 40” X 2.5” slices. For these observations, the 
spectrograph covered a range of 3,940-4,110 A with slit-width-limited 
resolution Ad ~ 2.5 A around the redshifted Lyman-« (Ly) line. The 
methodology and details of our observations and data analysis are 
discussed in the Methods and described extensively elsewhere*?. 
Data reduction of the PCWI observations resulted in an adaptively 
smoothed data cube (consisting of the co-added sum of all the expo- 
sures) of dimensions right ascension, declination, and wavelength 
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noise ratio exceeds 7.0. Sources near QSO A are plotted. b, Narrow-band image 
generated from the PCWI data cube by summing the flux density I, over the band 
3,970-4,000 A showing the QSO UM287 (source A), the nearby, fainter QSO 
(source B), and the bright filament of Ly emission extending from UM287. 
Emission sources C and D are also shown. The data cube is summed over 7.5 
arcsec perpendicular to each position along the pseudo-slit to form the image in a. 


1Cahill Center for Astrophysics, California Institute of Technology, 1216 East California Boulevard, Mail code 278-17, Pasadena, California 91125, USA. 2Caltech Optical Observatories, Cahill Center 
for Astrophysics, California Institute of Technology, 1216 East California Boulevard, Mail code 11-17, Pasadena, California 91125, USA. 3ETH Zurich, Institute for Astronomy, Wolfgang-Pauli-Strasse 27 
8093, Zurich, Switzerland. Department of Astronomy and Astrophysics, University of California, 1156 High Street, Santa Cruz, California 95064, USA. °University of California Observatories, Lick 


Observatory, 1156 High Street, Santa Cruz, California 95064, USA. 
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(RA, dec., 4). The bright extended filament is clearly detected in the data 
(Fig. 1): it is fainter far from QSO UM287 and shows a relatively narrow 
line (velocity) width (Fig. 1). Channel cuts through the smoothed data 
cube (Extended Data Figs 2, 4, 5) illustrate a disk morphology, and 
demonstrate that neither the QSO subtraction nor the presence of a 
nearby line and continuum object (source C) are responsible for the 
disk emission that is closest to QSO UM287 (see Methods). To reveal the 
spatial structure of the emission, we generated a new narrow-band 
window that follows the velocity-shear structure (Fig. 2). 

The narrow-band and spectral images reveal an extended gaseous 
disk that is offset from, and illuminated by, the nearby quasi-stellar 
object QSO UM287 (which we call source A). The disk has a diameter 
of approximately 125 proper kiloparsecs (pkpc), with a central deficit 
with a diameter of about 25 pkpc. It is inclined by approximately 70° to 
the plane of the sky. The 2D velocity profile is well fitted by a rotating 
disk in a Navarro—Frenk-White (NFW) dark-matter halo with mass 
log, >Mn = 13.178, where M,, is the mass of the halo in solar masses 
(Mo), halo concentration c=3 a (unitless), circular velocity 
(the maximum rotation velocity of the disk, at the virial radius) v. ~ 
500kms~', and virial radius Ryi, = 225 pkpc. Using a pseudo-slit 
(Fig. 2), we extracted a 1D velocity profile (Fig. 3), which is also 
well fitted by the same model, with consistent inferred mass 


LETTER flayante, 


log,)>Mn = 13.1733 and halo concentration c=5*%. Both fits include 
the effects of slit averaging, and the instrument and seeing point spread 
functions (PSFs) that are measured using QSO A. The velocity profile 
flattens at the lower end (in detector coordinates), but continues to 
move blueward slowly at the upper end beyond the disk. The upper 
end of the disk smoothly transitions into a filament of emission that 
has a moderate velocity shear of about 100-150 kms’ over a length of 
at least 125 pkpc. An extended disk of gas provides a natural explana- 
tion for the unusual brightness of the QSO UM287 nebula, which 
when modelled as a cosmic web filament® of typical column density 
required very high clumping factors. The clear disk-like morphology 
and kinematics of the nebula provide evidence that the Lyo spectral 
line is probing in situ kinematics'°”. 

We model the illumination of the disk by QSO source A and use 
simple arguments to constrain the geometry (Fig. 4). We assume that 
this QSO emits symmetrically into two coaxial cones, each with a solid 
angle of msr, with the rear cone illuminating the disk, and the front 
cone including our line of sight. We estimate that under these condi- 
tions the QSO can irradiate the disk and filament if the disk is more 
than 65 pkpc behind the QSO. 

We used the ‘Cloudy’ nebular emission code’’ to model the disk and 
constrain the gas column density. We find that the gas is optically thin 
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Figure 2 | Spectral image and narrow-band image created from a sheared- 
velocity window. a, Spectral image created with pseudo-slit (gridded region 
in b) and full-field subtraction. vo is the systemic velocity of the system. 

b, Narrow-band image formed using sheared, Av = 500 km s | window (bold 
white lines in a). The image and spectrum are consistent with a large (15 arcsec 
or 125 pkpc), tilted (position angle of 15°), inclined (70°) disk with 
ve~450kms | andacentral deficit (at gird position (station) 26 arcsec, that is, 
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the grid location along slit). The blueshifted end of the disk transitions to an 
extended filament. c, Mean velocity in the sheared-velocity (kinematic) window 
(trimmed to show velocities only where emission exceeds 20% of the maximum 
value). d, e, Zooms of b and c, respectively. f, 2D velocity model that is 
compared with c. Panel d has the same colour scale as b and panels e and 

f have the same colour scale as c. g, va map of the 2D model fit residuals; full- 
scale y* = 5. 
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Figure 3 | Physical properties of the extended disk. a, Mean velocity 
(symbols) along the pseudo-slit (Fig. 2) with respect to the disk central velocity 
Vo, with vertical lines showing an approximately 150 pkpc extent of the disk 
(dotted black lines), and virial radius (red dashed line). The extent of the 
filament profile is 80 pkpc <R < 250 pkpc, where R is the radial coordinate of 
the disk. The NFW profile fits with and without PSF convolution are plotted in 
red and blue, respectively. Error bars are +1o. b, Inferred hydrogen column 
densities Ny assuming ta = 3 pkpc. Error bars are +1o and do not include 
the uncertainty due to disk thickness. c, The cumulative baryonic mass M, 
(dots) and dark-matter mass Mg (red line) within a sphere of radius R. d, Ratio 
of baryonic mass inside R to dark-matter mass (symbols) and the canonical 
ratio of baryonic to dark-matter mass of 0.17 (horizontal dotted line). 


-50 0 50 150 200 250 


in the Lyman continuum, and that the column density is a strong 
function of disk thickness and flux only, as previously shown’*™. 
The brightness of the fluorescent Lyx emission suggests that the 
dust-to-gas ratio in the disk is low (implying low metallicity), because 
a substantial reduction in output flux at a hydrogen column density 
Nu ~ 10°! cm? occurs at dust-to-gas ratios greater than 10 °-10 7 
times the standard Milky Way ratio”. 

Because the inferred gas mass is proportional to the square root of 
the disk thickness tg, we require an estimate of tg. For a self-gravitating 
gas disk supported by thermal or turbulent pressure, the thickness is 


related to the gas__ temperature and column density: tg~ 
0.35 ee) (pnt) pkpc. For all the models in our parameter grid, 


the disk temperature Ty ~ 3 X 10* K, which implies a disk thickness of 
at least 1 pkpc and a plausible disk gas density my ~ 0.5t; °“8 cm~?. 
Simulations that produce extended disks* suggest a thickness that is less 
than 10 pkpc. Such a disk would have three times the column density of 
the 1-pkpc case and require turbulent velocity support to maintain the 
10-pkpc thickness of o,~60kms~' (for a pressure-supported disk 


2 -1 fae . 
ta~5(— 2) (Gatti) ), which is comparable to the velocity 
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Figure 4 | Sketch of the geometry of the QSO-disk system. a, Line of sight 
(LOS) view as seen by the telescope/PCWI showing both QSOs, the disk, the 
filament, and the angular momentum vector for the disk, L. b, View normal to 
the disk, for the case of a 125-pkpc distance between QSO A and the disk. 

c, Approximate geometry (showing the QSO, the LOS, the QSO illumination 
cones (60° half-angle, m sr per cone), for the case of a 125-pkcp distance 
between QSO A and the disk. 


dispersions in high-redshift disks and is less than the approximately 
80-90 kms’ range that would produce efficient shocking and radi- 
ative cooling. Additionally, the presence of a clear emission deficit in 
the central 25 pkpc suggests that the disk is much thinner than 
25cos(70°) ~ 8.6 pkpc. Consequently, we take 1-10 pkpc as a reas- 
onable range and use tq = 3 pkpc for all the mass estimates below, 
which have errors of +0.25 dex. We infer the column density as a 
function of disk radius, assuming a 3-pkpc-thick disk (Fig. 3b). 

The estimates above allow us to compare the halo mass profile 
derived from the rotation fits to the inferred baryonic gas mass profile 
(Fig. 3c, d). We find that the total baryonic mass in the disk is 


=> 11.2 t. 
My =10 (ste 


0.52 
) Mo, where Mo is the solar mass. The ratio of 


0.52 
: 4) , which for tq = 3 pkpc 
is about 8% of the canonical ratio 0.17. As expected, the baryonic mass 
is about 8% of the baryonic Tully-Fisher relation’® for v. = 470km 
s |. The low baryon fraction in the cold disk component may imply 
that either the baryons are lagging the dark-matter collapse or that a 
substantial component is shocked to a much higher virial temperature. 
However, the mass profiles in Fig. 3c, d are quite similar, with only a 
small additional deficit in the inferred baryon ratio at the centre of the 
disk. We estimate that the filament has a mass Mgi)~10!!Mo, which is 
comparable to the disk mass, and suggests that some of the baryons 
associated with the halo will accrete in the future. 

The integral field spectroscopic data allows us also to determine 
the total angular momentum of the disk baryons: Ly ~3.2 x 
10'°¢$°°*Mo kpc km s~!. The specific angular momentum jy is inde- 
pendent of the disk thickness: j, = L,/M, = 2 X 10* pkpckms  '. These 
quantities, in conjunction with the halo parameters, allow us to deter- 
mine the normalized spin parameter’: 2, = jp / (V2RvirYvir) = 0.14, 
where 1,;, is the virial velocity. Typically, dark-matter halos have spin 
parameters Aq ~ 0.04 at z = 0, although spin parameters are predicted 
to be higher at z = 2, sometimes reaching” 14 ~ 0.1. The baryonic spin 
parameter that we measure is an additional 40% larger, suggesting that 
the baryons in the extended disk have an angular momentum that is 
much greater than that of the dark matter halo. The large diameter and 
very high angular momentum of the structure strongly suggest that the 
extended disk is a ‘cold flow disk’, similar to those predicted by simula- 
tions*. This is the first directly imaged example of such an object, to our 
knowledge. 


baryonic mass to total mass is 0.013( 4 
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The extended disk of QSO UM287 does not have a detected stellar 
component in the continuum image, although there is continuum 
emission in the central part of the disk that coincides with the Lya 
emission deficit (source D, another bright continuum emission object, 
is located almost exactly at the centre of the disk). The column density 
that we estimate, Ny ~ 2.5 X 107’ cm *, is above the star-formation 
threshold for solar metallicity. Although feedback from the QSO illu- 
mination could halt star formation for 1-10 Myr, the continuum rest- 
frame far-ultraviolet emission probes a 100-Myr timescale. The lack of 
star formation in the extended disk could be explained by a low metal- 
licity (Z < 0.1), which raises the star-formation threshold, owing to the 
lower dust content and H-formation rate’®. Source D implies a central 
star-formation rate of >15Mg yr—'; the Lya emission deficit could be 
caused either by increased dust absorption or by a gas deficit. The 
central deficit could also be produced by preferential photoevapora- 
tion by the QSO, or H 1 in the foreground (on the observer’s side of 
the QSO) given that H 1 is abundant in the circumgalactic medium 
around quasars’”. 

Other possible causes of the observed velocity shear deserve com- 
ment, such as whether the extended disk is a result of the interactions 
that produced QSO source A and another bright object, QSO source B. 
We explore some alternatives in the Methods. QSO-merging disk 
models and observations imply, in general, a late appearance of the 
optical QSO phenomenon and only faint and fading signs of interac- 
tions. The large size, smooth kinematics and excellent fit to a simple 
disk model are not consistent with merging disks and tidal tails. The 
lack of any direct connection to QSO A or velocity shear nearing and 
crossing QSO B suggest that these objects are not directly linked or fed 
by the disk. The observations are best explained by an extended rotat- 
ing disk linked to a cosmic web filament. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Instrument and observations. We have constructed an integral field spectro- 
graph, called the Palomar Cosmic Web Imager (PCWI), that is designed to search 
for, map, and characterize intergalactic-medium emission and other low surface 
brightness phenomena’. It is built with a 40” x 60" reflective image slicer with 
twenty-four 40” X 2.5” slices. PCWI is mounted at the Cassegrain focus of the 
Hale 5-m telescope on Mt Palomar, USA. The imaging resolution, while limited 
by the 2.5” slicer sampling, can be effectively improved to approximately 1.3” by 
dithering the field between individual exposures. A description of the instrument, 
general observing approach, and data analysis methodology is given in refs 8 and 9. 
For the UM287 observation, the spectrograph was fitted with a Richardson 
reflection grating, blazed near 5,000 A, and has an instantaneous bandwidth of 
approximately 170 A with the nod-and-shuffle mask in place. With this grating, 
the spectrograph attains a slit-limited resolution AZ ~ 2.5 A and a peak efficiency 
of about 4% at 4,000 A including the telescope and atmosphere. 

We obtained a total of 2 h on-source and 2 h off-source exposure centred on 

QSO UM287 on 23 September 2014. Individual exposures were acquired using the 
nod-and-shuffle technique”. The PCWI implementation of nod-and-shuffle 
uses the central third of the charge-coupled device (CCD) to record the spectrum, 
and masks the outer two thirds of the detector for storage (restricting the effective 
common bandpass to about 170 A given the image slicer offset brick wall pattern 
and slit curvature). Individual exposure frames are created by interleaving an 
integral number, N, of on-target telescope pointings of length t with N + 1 back- 
ground pointings, the first and last oflength ¢/2, and the remainder of length t. This 
cadence results in separate source and background tiles being built up on the CCD, 
each equivalent to an (N X t) exposure. A typical 40-min exposure (t= 120s, 
N= 10; 20 min source and 20 min background) takes approximately 50 min of 
wall-clock time, including CCD read-out. The benefits of using the nod-and- 
shuffle method are that the sky is sampled frequently and nearly contempora- 
neously with the object, it is imaged through the same optical path as the object, 
and it is recorded using the same detector pixels as the object. This improves 
sky subtraction precision, reduces the contribution of detector read noise by 
decreasing the number of CCD read-outs, and limits the impact of instrument 
systematics. Pixel binning the detector 2 by 2 further reduces the impact of the 
three-electron (root mean square) read noise. 
Data cube generation. We collected numerous calibration images using internal 
and dome lamps throughout the observing period. The obtained 2D spectra were 
processed with the PCWI/Keck CWI data reduction pipeline. They were rectified 
and aligned using geometric mask calibration images and arc lamp spectra. 
Background panels were subtracted from source panels. Individual 3D data cubes 
were then wavelength-shifted to compensate for the <1 A of total flexure using 
sky lines. The final mosaicked and co-added data cubes were assembled using 
the astrometry on the basis of the QSO position. The reconstruction is accurate 
to about 0.5 arcsec (root mean square). 

Exposure maps were generated by processing normalized flat-field images 
(calibration and twilight flats) in a similar fashion. The result is a set of data cubes 
(RA, dec., and A) for each exposure, sampled at (0.55”, 1.1 A) and covering 3,940- 
4,110 A. As the nod-and-shuffle mask does not physically contact the CCD, a small 
amount of diffuse continuum light scatters underneath it and remains in the 
subtracted cube (<1%). This residual is easily subtracted with a low-order con- 
tinuum fit. The data cube was normalized using the measured signal from the QSO 
UM287 and its measured spectrum at 4,000 A from the Sloan Digital Sky Survey. 
The absolute flux measurements are accurate to +10%. 

The data cube projections (other than Extended Data Fig. la-d) are produced 
from this difference flux cube by a 3D adaptive-smoothing-in-/ algorithm, 
which incorporates a hierarchical adaptive-smoothing algorithm in space and 
wavelength*’. The 3a noise threshold used for this algorithm is derived directly 
from the difference cube, and is consistent with the predicted Poisson noise. 
Raw spectral and narrow-band images compare well with the smoothed images 
(Extended Data Fig. 1, Fig. 1). 

Channel maps. As the filament approaches QSO UM287 (Fig. 1) the mean velo- 
city shears red-ward and the dispersion broadens. A succession of 2-A channel 
cuts through the smoothed data cube (Extended Data Fig. 2) illustrate a disk 
morphology with emission moving linearly towards the QSO (but offset) as the 
channels move from blue to red. These images also demonstrate that neither the 
QSO subtraction nor the presence of a nearby line and continuum object (source 
C) are responsible for the disk emission nearest QSO UM287. To reveal the spatial 
structure of the emission, we use the kinematic and spatial behaviour gleaned from 
the channel maps (Extended Data Fig. 2) to design a new narrow-band window 
that follows the velocity-shear structure. Using this technique we find evidence of 
an extended gaseous disk offset from, and illuminated by, the nearby QSO (Fig. 3). 
Bright QSO subtraction. We generated several figures in the text (Fig. 3, 
Extended Data Figs 2 and 5) by subtracting an average image. In the case of 


spectral images, we calculated and then subtracted an average intensity versus slit 
position map. That is, we calculated a single flux at each slit position and then 
subtracted it from the spectral image—this can be thought of as a single-value 
continuum subtraction. In the case of narrow-band images, a single PSF is calcu- 
lated by averaging over a wavelength range. This average image is subtracted from 
each narrow-band image. The wavelength range used is 3,958-4,013 A, and 
includes the Lyx emission region (we include this to be conservative since it 
subtracts a small amount of extended emission, but we note it does not noticeably 
alter the results). In Extended Data Fig. 2, we subtract only the emission within an 
elliptical contour (reflecting the PSF) near the bright QSO. In Fig. 2, we subtract 
the average over the full spectral image in order to remove both QSOs and high- 
light the disk emission. For the narrow-band images in Extended Data Fig. 2 and 
Fig. 2, we calculate an average image for the bright QSO and subtract that from the 
narrow-band image, to highlight the disk emission. 

Nearby sources and source-subtraction residuals. It is important to dem- 
onstrate that the disk emission is not an artefact produced by the nearby QSO 
or the continuum source C. We show (Extended Data Fig. 3) the CWI, W. M. Keck 
Observatory narrow-band, and Keck V-band continuum image, along with loca- 
tions of the QSOs (A and B) and two sources near QSO A (Cand D). Source C is a 
fairly bright, compact continuum and line source near the QSO. Source D is a 
fainter, slightly extended continuum and Lya line source also near the QSO. 
Sources C and D appear distinct in both the Keck continuum and the narrow- 
band images. Source D falls exactly in the disk emission minimum and centre. 
Source C is offset north from the centre of the disk (Fig. 2). Residuals following the 
elliptical contour are outside the subtraction area. When we perform no subtrac- 
tion (Extended Data Fig. 4), we see that the excess emission associated with the 
disk and source C are still present in all channels, and in particular in the positive 
velocity channels. We can also remove the average emission from each entire 
image, not just near QSO A (Extended Data Fig. 5). Again the average is taken 
over the range 3,958-4,013 A including the Lye emission region. Source C is 
removed because it is a continuum source, except in the channels in which its line 
emission appears (Extended Data Fig. 5j, k). Considerable positive-velocity disk 
emission remains after removing the continuum, particularly in Extended Data 
Fig. 5h-j, k, and clearly exceeds any small remaining residuals due to small 
(around 1%) variations of the PSF with wavelength. This emission is well offset 
from source C. Finally, we can use a stepped vertical pseudo-slit to show a kine- 
matically sheared profile that is consistent with a disk at several locations on and 
off QSO A (Extended Data Fig. 6). 

Kinematic modelling. We model the 2D velocity profile (Fig. 2c, e) with a 
Navarro-Frenk-White (NFW) dark-matter halo profile’. We assume that the 
velocity profile is circular and dominated by dark matter with no stellar or bary- 
onic contribution. We convolve the predicted 2D profile with the measured PSF 
based on the QSO A image. The 2D velocity profile is well fitted (Fig. 2f, g) by a 
rotating disk in an NFW dark-matter halo with mass log,)My=13.179%, 
halo concentration c=3 7, central velocity near systemic (vp = 50 + 30km s +); 
circular velocity (at the virial radius) v.~500km s ', and virial radius 
Ryir = 225 pkpc. Using a pseudo-slit (Fig. 2) we extracted a 1D velocity profile 
(Fig. 3a), which is also well fitted by an NFW model with consistent inferred 
geometry, mass log,,)M = 13.1193, and halo concentration c=5*3. The blue- 
shifted end of the disk (v~ —450kms_!) smoothly transitions to an extended 
filament (80 pkpc < R < 250 pkpc, crossing the faint QSO) with the same velocity 
of about —450kms_'. 

Intensity modelling. To constrain the geometry of the disk with respect to the 
LOS and the QSO, we set up a simple model (Fig. 4). We assume that the QSO has 
two opposite emission cones each with 7 sr (60° cone half-angle). The line of sight 
is included in one of these cones, and the disk and the filament feeding the disk are 
included in the other. The disk inclination is estimated from the axis ratio and the 
2D kinematic model fit to be about 70° +5°. The requirement that the rear 
emission cone illuminates all of the visible disk and filament places a lower limit 
on the separation between the disk and the QSO of 65 pkpc. We note that QSO B 
could also produce some of the illumination, but we neglect that here. 

We use the ‘Cloudy’ nebular emission code (version 11.10) to estimate the 
column density and constrain the gas density of the illuminated disk. We use a 
standard QSO spectrum with luminosity logio(vL,) = 46.9, black-body temper- 
ature T= 1.5 X 10° K, and power-law indices 7x = —1.4 (for optical-to-X-ray), 
ayy = —0.5 (for UV), and a, = —1.0 (for X-ray). We populate a grid of fixed 
number density models with total gas column density logygNy = {20.5, 
20.6,...,22.5}, distance between the disk and QSO A fo = {75, 125,...275} pkpc, 
disk thickness tq = {1, 3, 10, 30} pkpc, disk turbulent velocity o, = {0,300} km s~ i 
disk metallicity [Z/H] = {0, —1, —2}. For each of these models a Lye flux is calcu- 
lated, with contributions from recombination radiation, line scattering, and col- 
lisional excitation also determined. We find that the model predicts that intensity 
only strongly depends on disk thickness (disk density) and hydrogen column 
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density, as expected, since the neutral fraction (1440 /my) scales as the square root of 
the gas density. The model predicts that the gas is optically thin in the Lyman 
continuum, and thus column density only a strongly depends on disk thickness 
and flux, as previously shown'*'*. For recombination radiation we expect 
Nu = (Ita)°* cm where I is the intensity; the resulting model nearly replicates 
this dependence: Ny = 10721 495206 RO. g,°-07(Z /H] °°, where fo is the Lye 
flux in units of 3 X 10°!” erg cm *s arcsec 7 and Rigo is the distance between 
the QSO and the disk in units of 100 kpc. The neutral hydrogen column density 
Ny: is relatively independent of disk thickness, and varies strongly with QSO/disk 
separation, ranging from Ny = 10'S? cm ? for Ryoo = 0.75 to Ny = 10'”° cm * 
for Rigo = 2.0. In the regime determined from the data, the neutral fraction is low 
(<1) and most of the Lyo emission is produced by radiative recombination (as 
opposed to line scattering*** or collisional excitation”). The model is insensitive to 
whether the disk is thermally or turbulently supported, because, in the model grid, 
the thickness and column density are fixed to enable us to derive the intensity, and 
the line emissivity is only a weak function of the turbulent velocity o,. 

The brightness of the fluorescent Lyo suggests that the dust-to-gas ratio in the 
disk is low, because substantial reduction in output flux at Ny ~ 107! cm~? occurs 
with dust-to-gas ratios greater than 10-*-10° 7 times the standard Milky Way 
ratio’. Finally, we use the average intensity at each point of the pseudo-slit in Fig. 2 
to determine Ny as a function of the other parameters. We estimate the filament 
mass using the same model. Here, the filament thickness is taken to be 24 pkpc, on 
the basis of a filament width of approximately 3 arcsec, length of 160 pkpc, and a 
mean Ny ~ 7 X 10?°cm ? (Fig. 3). The filament mass is then Mg ~ 100°*Mo, 
where fg = 24 pkpc. 

We compute the angular momentum of the disk by assuming that the fitted 
velocity profile (intrinsic, before blurring and averaging) and column density 
profile apply to the full 180° associated with each half of the disk. 

Radiative transfer. Optically thick Lyx radiation can produce a double-peaked 
profile as line photons escape from the line centre by frequency random walk’’. 
The optical depth t) for an H 1 column density Nyz,~ 10'7 cm”? is t) ~ 3,000, 
which for a gas temperature T~ 10*-10° K (and equivalent turbulent velocity) 
gives a double-peak separation less than 1.5 A. This separation would not be 
resolved in our observation. Under these conditions, for line photons escaping a 
relatively kinematically quiet disk (¢, < 50 kms7'), and because we do not expect 
substantial velocity shears along the line of sight, the line centroid should reflect 
the average local gas velocity’”". 

Alternative models. Since the UM287 disk is found in proximity to QSO A (and 
B), it is possible that it was produced by the same interactions that produced QSO 
A (and possibly B), or in interactions by the halo that the object could share with 
QSO A (and possibly B). Simulations of tidal tails in cold dark matter models 
suggest that long tails can be formed of about 250 pkpc** **. Here we make some 
general observations, and in this discussion refer to the UM287 nebula as being 
composed of the bright filament (Fig. 2, at station 20-35 arcsec) and the faint 
filament (Fig. 2, station 35-50 arcsec). Luminous QSOs at high redshift are prob- 
ably formed from major mergers of two gas-rich disks”. The optical QSO phase is 
late in the evolutionary sequence of such a merger and tends to show only faded 
and faint evidence of interactions such as tidal tails*”**. Our kinematic observa- 
tions of the UM287 nebula are well fitted by a smooth rotating disk connected toa 
quiescent filament with low velocity shear. The intensity distribution is also disk- 
like. The inferred gas column densities (10*'° cm~*) and gas masses (10''Mo) are 
high. The span of the bright filament is 120 pkpc and of the entire filament is 
several hundred proper kiloparsecs. 

We consider three classes of alternative models. First, an undetected disk could 
be interacting with the host of QSO B, leading to a merging disk and long tidal tails 
illuminated by QSO A. However, there is no evidence in Lyo for emission or the 
complex kinematics of a merging disk at, or near, QSO B. Long tidal tails are 
usually fairly thin, curved, fade away from the interaction region, and show con- 
tinuous velocity shear as various parts of the tail expand, rotate, and fall back’. 
Neither the intensity, morphology, nor velocity profiles of the UM287 filament 
exhibit this behaviour. No emission is seen from an opposing tail. 

A second possibility is that the bright part of the filament is part of a merging 
disk hosting QSO A or the tidal tail, and that the faint filament is part of the tidal 
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tail. Several issues arise with this picture. The bright filament is tangential to QSO 
A with no detected connection, whereas a tidal tail would be expected to project 
radially outward and then curve tangentially, all in a plane that includes the host 
merging disk and QSO A. It is difficult to illuminate the bright filament with QSO 
A unless their separation is at least 100 pkpc, which is inconsistent with the 
possibility of the bright filament being part of the merging disks. There is no 
evidence for complex merging disk/tail kinematics, for example large velocity 
shears where the tidal tail meets the merging disks, or curvature in velocity- 
position space of the tail**. There are no continuum counterparts (other than D) 
to the filament, as usually seen in tidal tails. The filament is quite wide (approxi- 
mately 60 pkpc) compared to observed and modelled tails. Gas moving outward in 
a tidal tail would be likely to show a kinematic discontinuity with the outer disk 
rotation curve. Typically, the base of the tail is closer to the systemic velocity of the 
merging system (as well as the end, with a noticeable curvature in between), 
because this gas (farthest from the centre of mass at perigalacticon) has suffered 
a tidal impulse that launches the tidal tail. Although there is no direct way to 
determine whether the gas in the filament is in-falling from the cosmic web or 
moving outward in a tidal tail, the lack of this kinematic signature is further 
(indirect) evidence that the gas is flowing in and smoothly merging with the outer 
parts of the disk. 

A third possibility is that the filament is produced by an interaction that does 
not produce the QSOs but occurs in the same halo. In this case, the main purpose 
of positing the interaction is to explain the large size of the filament. Again, the 
simple and smooth kinematics, the large width, the high gas mass, and the disk-like 
morphology suggest that this is neither a merging disk system or a tidally disrupted 
disk passing through the halo of QSO A. 

Thus, although we cannot investigate all possible interaction geometries and 
scenarios here, a merger/tidal scenario is not favoured by our observations. 
Code availability. We choose not to make the pipeline code available at this time 
because it is not fully documented for public use but plan to do so in early 2016. 


20. Sembach, K.R. & Tonry, J. L. Accurate sky subtraction of long-slit spectra: velocity 
dispersions at =v = 24.0 mag/arcsec®. Astron. J. 112, 797-805 (1996). 

21. Glazebrook, K. & Bland-Hawthorn, J. Microslit nod-shuffle spectroscopy: 

a technique for achieving very high densities of spectra. Publ. Astron. Soc. Pacif. 
113, 197-214 (2001). 

22. Cuillandre, J.C. etal. ‘“Va-et-Vient” spectroscopy: a new mode for faint object CCD 
spectroscopy with very large telescopes. Astron. Astrophys. 281, 603-612 (1994). 

23. Navarro, J. F., Frenk, C. S. & White, S. D. M. A universal density profile from 
hierarchical clustering. Astrophys. J. 490, 493-508 (1997). 

24. Cantalupo, S., Porciani, C., Lilly, S. J. & Miniati, F. Fluorescent Lya emission from the 
high-redshift intergalactic medium. Astrophys. J. 628, 61-75 (2005). 

25. Cantalupo, S., Porciani, C. & Lilly, S. J. Mapping neutral hydrogen during 
reionization with the Lya emission from quasar ionization fronts. Astrophys. J. 672, 
48-58 (2008). 

26. Barnes, J. E. Encounters of disk/halo galaxies. Astrophys. J. 331, 699-717 (1988). 

27. Springel, V. & White, S. D. M. Tidal tailspin cold dark matter cosmologies. Mon. Not. 
R. Astron. Soc. 307, 162-178 (1999). 

28. Toomre, A. & Toomre, J. Galactic bridges and tails. Astrophys. J. 178, 623-666 
(1972). 

29. Hopkins, P. F., Hernquist, L., Cox, T. J. & Keres, D. A cosmological framework for the 
co-evolution of quasars, supermassive black holes, and elliptical galaxies. |. Galaxy 
mergers and quasar activity. Astrophys. J. 175 (Suppl.), 356-389 (2008). 

30. Guyon, O., Sanders, D. B. & Stockton, A. Near-infrared adaptive optics imaging of 
QSO host galaxies. Astrophys. J. 166 (Suppl.), 89-127 (2006). 

31. Hutchings, J. B. Host galaxies of z ~ 4.7 quasars. Astron. J. 125, 1053-1059 
(2003). 

32. Hutchings, J. B., Cherniawsky, A., Cutri, R. M. & Nelson, B. O. Host galaxies of two 
micron all sky survey-selected QSOs at redshift over 0.3. Astron. J. 131, 680-685 
(2006). 

33. Kawakatu, N., Anabuki, N., Nagao, T., Umemura, M. & Nakagawa, T. Type | 
ultraluminous infrared galaxies: transition stage from ULIRGs to QSOs. Astrophys. 
J. 637, 104-113 (2006). 

34. Urrutia, T.,, Lacy, M. & Becker, R. H. Evidence for quasar activity triggered by galaxy 
mergers in HST observations of dust-reddened quasars. Astrophys. J. 674, 80-96 
(2008). 

35. Hibbard, J. E., van der Hulst, J. M., Barnes, J. E. & Rich, R. M. High-resolution H 1 
mapping of NGC 4038/39 (‘‘The Antennae’”’) and its tidal dwarf galaxy 
candidates. Astron. J. 122, 2969-2992 (2001). 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 1 | Illustration of raw and conventionally smoothed a, conventionally boxcar smoothed by 10 pixels. d, Raw narrow-band image 


data. a, Raw-data spectral image of pseudo-slit obtained in the slit shown shown in b, conventionally boxcar smoothed by 10 pixels. e, Spectral 

in b. lo error is about 3 kLU or approximately one colour scale step. b, Raw- _ image obtained by 3D adaptive smoothing as discussed in the Methods. 
data narrow-band image obtained in the 3,970-4,000 A band. 1c error is f, Narrow-band image (3,970-4,000 A) obtained by summing over the 3D 
50 kLU or about 0.5 colour scale steps. c, Raw spectral image shown in adaptively smoothed data cube, as discussed in the Methods. 
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Extended Data Figure 2 | Channel maps of the UM287 data cube. a-l, Panels 
show individual velocity channels that are 150 kms’ wide, corresponding 
to a2 A width. Sources A-D near QSO UM287 are plotted. Velocities are with 
respect to the UM287 systemic velocity. QSO A has been subtracted by 
calculating an average PSF for the QSO over the 3,970-4,000 A band, and then 
subtracting this slice by slice, within an elliptical radius of 6 arcsec in the x 
direction and 7.2 arcsec in the y direction. The residual flux in certain channels 
is due to three effects. (1) At the centre of QSO A, the residual flux is due 

to emission lines in the QSO around Lyz. (2) At the elliptical boundary 
surrounding the QSO, a small subtraction residual can be seen outside the 
ellipse within which the subtraction is performed. This is typically 3-5 kLU, and 
can be seen clearly without additional sources in c and d. (3) Emission sources 
are present near the QSO in certain channels. Source C shows line emission 
primarily in j and k (430kms_'<v<731kms_'), and its continuum 


emission is subtracted along with the QSO. Any emission near the subtraction 
boundary above 5 kLU is not a subtraction residual due to the QSO. The 
disk emission appears bright in b southeast of QSO A, and moves north, 
approaching the QSO as the velocity moves redward. In the two central velocity 
channels, the emission appears almost ring-like and continues to move north, 
roughly centred on source D. The emission continues to move north in 

h-j. The emission is 15-25 kLU and therefore cannot be QSO subtraction 
residuals. The emission is also not partially subtracted emission from source C. 
The emission in i and j are several arcseconds east of source C. Further evidence 
for this point is given in Extended Data Figs 4 and 5. To indicate the 
approximate disk location, we show an elliptical contour with a major axis 
radius of 8.5 arcsec, position angle of 15.5°, and an ellipticity corresponding 
to an inclination of 70°. 
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Extended Data Figure 3 | Comparison of PCWI data and Keck narrow- 
band and continuum images. a, PCWI narrow-band image created by 
summing 3,970-4,000 A data-cube slices from the adaptively smoothed data 
cube. Sources A, B, C, and D are shown. PCWI image is not continuum 
subtracted. b, Keck continuum-subtracted narrow-band image on the same 


intensity scale as the PCWI image in a. c, Keck V-band image. Sources A, C, and 
D are shown; continuum magnitudes in the V band are approximately (+2 
mag) 16.6 AB, 22.2 AB, and 23.8 AB respectively. d, Keck continuum- 
subtracted narrow-band image on an expanded intensity scale. 


©2015 Macmillan Publishers Limited. All rights reserved 


RESEARCH 


a -916<v< —766 


[o} 


Ay [arcsec] 


-5 0 5 
Ax [arcsec] 


10 


e —314<v< —164 


N 
oO 


[o} 


Seal 
o 
3) 
n 
oO 
i 
i) 
= 
> 
a 


-5 O 5 
Ax [arcsec] 


10 


287<v< 437 


N 
oO 


oO 


Ay [arcsec] 


—§ 6 5 
Ax [arcsec] 


10 


b -—766<v< —-615 


-615<v< —465 


‘¥ 


30 


[o} 


Ay [arcsec] 
Ay [arcsec] 


Oe ey 


oo 


VW. 


oe eee eee rere ere 
-5 0 5 10 
Ax [arcsec] 


) 


-5 0 5 
Ax [arcsec] 


-—13<v< 136 


nN 
oO 


(oe) 
oO 


Ay [arcsec] 
Ay [arcsec] 


SLE LLL 


-5 O 5 10 
Ax [arcsec] 


437<v< 588 


N 
oO 


fo} 


Ay [arcsec] 
Ay [arcsec] 


-5 10 


Ax [arcsec] 


N 
oO 


[o} 


=-5 Oo 5 10 


Ax [arcsec] 


588<v< 738 


-5 0 3 10 


Ax [arcsec] 


Ay [arcsec] 


Ay [arcsec] 


d -465<v< —-314 


nN 
oO 


oO 


10 


sitive tirinti 
-5 0 5 
Ax [arcsec] 


h 136<v< 287 


sof, 


nN 
oO 


oO 


-5 O 5 10 


Ax [arcsec] 


738<v< 889 


N 
oO 


jo} 


Ay [arcsec] 


=5 
Ax [arcsec] 


Extended Data Figure 4 | Channel maps of the UM287 data cube. Panels been performed. Sources near the QSO are plotted. Ellipses are drawn as in 
show individual velocity channels that are 150 km s ‘wide, corresponding toa Extended Data Fig. 2. Velocities are with respect to the UM287 systemic 


2 A width, as in Extended Data Fig. 2. In this case, no source subtraction has velocity. 
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are with respect to the UM287 systemic velocity. This subtraction removes the 
average continuum from all sources, including source C. The emission centred 
on source C is present in j and k from line emission (presumably Lya). The 
progression of the disk emission can be seen as in Extended Data Fig. 2 from 
v=—700kms_' to v=+600kms". 


Extended Data Figure 5 | Channel maps of the UM287 data cube. 

a-l, Individual velocity channels that are 150km s | wide, corresponding to a 
2A width, as in Extended Data Fig. 2. In this case, the wavelength-averaged 
cube has been subtracted from the cube over the full field of view. Sources near 
the QSO are plotted. Ellipses are drawn as in Extended Data Fig. 2. Velocities 
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Extended Data Figure 6 | Spectral image and rectangular pseudo-slit slices 
of the UM287 data cube. a-c, The spectral image shown at left is in the same 
format as Fig. 1, and the narrow-band image that is obtained using a 
—600kms *<v<600kms * velocity cut is shown on the right with the 
corresponding slit location from which the spectral image is obtained. The 
vertical slits are 2.5 arcsec wide, and are positioned at 0 arcsec, 3.75 arcsec, and 
5.0 arcsec in the positive x direction (northeast) with respect to the 0 reference 


position. The average QSO A spectrum has been subtracted in the bright 
regions. In each spectral image, strong emission appears with a large, quasi- 
linear velocity shear centred on the QSO velocity. Also, the narrow-band image 
in this velocity range illustrates that, as the emission approaches the QSO, 
there is an offset to the northeast that is not consistent with a direct entry into 
the QSO. d-f, The same plots with the minimum intensity range set to a 
negative value to show negative residuals from the QSO subtraction. 
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Invariance under the charge, parity, time-reversal (CPT) trans- 
formation’ is one of the fundamental symmetries of the standard 
model of particle physics. This CPT invariance implies that the 
fundamental properties of antiparticles and their matter-conjugates 
are identical, apart from signs. There is a deep link between CPT 
invariance and Lorentz symmetry—that is, the laws of nature seem 
to be invariant under the symmetry transformation of spacetime— 
although it is model dependent’. A number of high-precision CPT 
and Lorentz invariance tests—using a co-magnetometer, a torsion 
pendulum and a maser, among others—have been performed’, but 
only a few direct high-precision CPT tests that compare the fun- 
damental properties of matter and antimatter are available**. Here 
we report high-precision cyclotron frequency comparisons of a sin- 
gle antiproton and a negatively charged hydrogen ion (H_ ) carried 
out in a Penning trap system. From 13,000 frequency measurements 
we compare the charge-to-mass ratio for the antiproton (q/m), 
to that for the proton (q/m), and obtain (q/m),/(q/m),—1= 
1(69) x 10~ 17. The measurements were performed at cyclotron fre- 
quencies of 29.6 megahertz, so our result shows that the CPT the- 
orem holds at the atto-electronvolt scale. Our precision of 69 parts 
per trillion exceeds the energy resolution of previous antiproton-to- 
proton mass comparisons” as well as the respective figure of merit 
of the standard model extension” by a factor of four. In addition, we 
give a limit on sidereal variations in the measured ratio of <720 
parts per trillion. By following the arguments of ref. 11, our result can 
be interpreted as a stringent test of the weak equivalence principle of 
general relativity using baryonic antimatter, and it sets a new limit on 
the gravitational anomaly parameter of |ag _ 1| <8.7x107’. 

The standard model is the theory that describes particles and their 
fundamental interactions, although without taking into account grav- 
itation. However, this model is known to be incomplete, which has 
inspired searches for physics beyond the standard model, such as tests 
of CPT invariance that compare the fundamental properties of matter- 
to-antimatter equivalents at the lowest energies and with the greatest 
precision’. For leptons, for example, the magnetic anomalies of 
electron and positron were compared with a fractional uncertainty 
of about 2 parts per billion*, and by applying similar techniques to 
protons and antiprotons, the resulting g-factor (a proportionality 
constant which links the spin of a particle to its magnetic moment) 
comparison reached a precision of 4.4 parts per million*. We are 
planning to improve this measurement by at least a factor of a thou- 
sand’*'”, In this context, we recently reported the most precise and first 
direct high-precision measurement of the proton magnetic moment, 
with a fractional precision of 3.3 parts per billion’*. Complementary to 
these efforts, spectroscopic comparisons of hydrogen and antihy- 
drogen are underway; recent progress has been made at CERN”. 
The most precise test of CPT invariance with baryons is the com- 
parison of the proton and antiproton charge-to-mass ratios. By 
measuring the cyclotron frequencies v, = (qBo)/(2mm) of single 


trapped antiprotons and H_ ions in a Penning trap with magnetic 
field Bo, the TRAP collaboration’ achieved a fractional precision of 
90 parts per trillion. 

In our measurements we also compare the cyclotron frequencies ofa 
single antiproton andanH ion. His used as a proxy for the proton; 
the negative charge facilitates the experiment by eliminating the need 
to invert trap voltages. Our advanced Penning trap system enables 
adiabatic particle exchange within 15 s, which is much faster than in 
previous (q/m),;-, comparisons. Our fast exchange rate allows for 
individual particle-to-antiparticle comparison cycles of only four min- 
utes. This high-precision mass spectrometry method enabled us to 
perform about 6,500 direct frequency ratio comparisons within a total 
measuring time of 35 days. Moreover, our measurements have been 
carried out in thermal equilibrium with the detection system at 
5.2(1.1) K, where systematic frequency shifts are small”. 

Our cryogenic Penning-trap system, which consists of a measure- 
ment trap and a reservoir trap, is shown in Fig. 1. It is mounted in the 
horizontal bore of a superconducting magnet at By = 1.946 T, the axis 
of the magnet being oriented at 60° with respect to the Earth’s rotation 
axis. Both traps have an inner diameter of 9 mm and are arranged in 
the five-electrode orthogonal and compensated design discussed in 
ref. 21. Transport electrodes connect the individual traps; they allow 
for fast adiabatic particle shuttling along the trap axis. The entire 
assembly is mounted in an indium-sealed cylindrical vacuum chamber 
with a volume of 1.2 litres. Once cooled to 4 K, ultralow pressures are 
reached, and thus antiproton storage lifetimes of more than a year are 
achieved. To measure the particle’s oscillation frequencies by image 
current detection” we used highly sensitive superconducting resonant 
detection coils**. The measurement trap detector is operated at a res- 
onance frequency of v,.5 = 645,262 Hz, has an inductance of 1.72 mH, 
and a quality factor of 11,300. 

Antiprotons are delivered in bunches by the Antiproton Decelerator 
of CERN; the antiproton bunches also release hydrogen from our 
degrader structure. H” ions are produced either by asymmetric dis- 
sociation of H; or by electron capture (a detailed study of the produc- 
tion mechanism has yet to be performed). Typically 100 to 350 cold 
antiprotons and about a third that number of H ions are prepared per 
ejection from the Antiproton Decelerator. From this particle cloud we 
extract a single antiproton and keep it in the centre of the trap, as well 
as an H_ ion, which is parked in the downstream park electrode. The 
other particles are shuttled to the reservoir trap. In case particles are 
lost during experiments, the measurement trap is reloaded from 
this reservoir. 

In charge-to-mass ratio comparisons cyclotron frequencies v, are 
measured. We use the invariance theorem™* 


weve tet (1) 


which relates the characteristic trap frequencies v;, v_ and v, to Vv. 
Comparisons of the antiproton-to-H  charge-to-mass ratio are 
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Figure 1 | Schematic of the measurement and the reservoir Penning traps. 
A resonant superconducting detection inductor is connected to each trap. 
Radio-frequency drives for particle manipulation are applied to the upstream 
correction electrode of the measurement trap. The upstream and downstream 
park electrodes are used for the particle shuttling scheme applied in the q/m 


equivalent to a direct antiproton-to-proton comparison, but system- 
atic shifts caused by polarity switching of the trapping voltages are 
avoided. The mass of the negative hydrogen ion is 

Me ca pol, H— Bo Ey =) 
mp ne Mp. - Np 


my- =m,(1 +2 (2) 
where m,/m, is the electron-to-proton mass ratio”, (pot, Bo) ts Mp 
is the polarizability shift’, E,/m, is the electron binding energy”® 
and E,/mp is the electron affinity of hydrogen’. If CPT invariance 
holds, the expected cyclotron frequency ratio is R=(v.)p/(Vc)u- = 
(q/m)5/(q/m)y- = 1.001089218754(2), the precision being limited 
by the accuracy of our knowledge of the proton mass”. 

A detailed measurement cycle is shown in Fig. 2a. Our measure- 
ments are triggered by the antiproton injection into the Antiproton 
Decelerator; this avoids systematic ratio shifts induced by beats 
between the measurements and ambient field fluctuations caused by 
the Antiproton Decelerator cycle. Immediately after the injection trig- 
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Figure 2 | Illustration of the measurement procedure. a, Detailed illustration 
of the measurement cycle. b, Potential configuration for antiproton cyclo- 
tron frequency measurement. The blue ellipse represents the particle reservoir. 
c, Potential configuration for H” cyclotron frequency measurement. 


ratio comparisons. In addition, the degrader structure that slows down the 5.3- 
MeV antiprotons from CERN’s Antiproton Decelerator is shown on the left. 
The electron gun shown on the right provides particles for the electron cooling 
of antiprotons. The entire assembly is mounted in a cryogenic vacuum chamber 
(see text). 


ger, the magnetron motion of the antiproton is cooled for 10 s (ref. 18). 
Then the antiproton’s axial frequency vz.) is measured, followed by a 
determination of the modified cyclotron frequency at v+5 = 29.656 
MHz, which is extracted from a sideband measurement as described 
in ref. 18. These two measurements take 30s and 48s, respectively. 
The particle’s magnetron frequency is determined by evaluating 
Vp © Vzp/2V4p = 7.02 kHz, and thus v.,5 ~ 29.663 MHz is obtained. 
Next, we ramp the trapping potential from the configuration shown in 
Fig. 2b to that shown in Fig. 2c, and adjust the trapping voltage by 
5 mV to tune the axial oscillation frequency of the H_ ion into res- 
onance with the superconducting detector. Afterwards, we perform 
similar measurements with the hydrogen ion to determine 
VeH- ~~ 29.635 MHz. Thus, a single ratio comparison takes exactly 
two Antiproton Decelerator cycles, corresponding to a typical mea- 
suring time of 220-240 s. To obtain from all performed measurements 
the final experimentally determined frequency ratio Rexp = Vep /Veou- = 
(q/m);/(q/m)y-» we reject those data points measured when 
magnetic field changes caused by activities in the Antiproton 
Decelerator accelerator hall were observed. These changes are iden- 
tified by an array of giant magneto resistance, Hall and flux-gate 
magnetic field sensors. 

Subsequently the ratios are processed as follows. To remove 
systematic ratio shifts caused by the intrinsic magnetic-field drift 
1/By X (AB/At) = —5(1) X 10°? per hour of our superconducting mag- 
net, we compute the antiproton cyclotron frequencies (vcp,x)(t) = 
Vep kt (Vepk+1—Vepk)/(tke1—t)t, where k is the index of indi- 
vidual measurements. Subsequently we evaluate the upper ratio 
(Vepk)(tH- 4) /VeH-.k Where ty-, is the centre time of the voH- % 
determination. In addition, the reciprocal ratios vep.4/(veH-,k) (t,t) 
are evaluated in a similar way. From the processed data we extract the 
mean of the frequency ratio by performing a maximum-likelihood fit 
of a Gaussian distribution to the upper ratio as well as to the reciprocal 
results and calculating the average of both. 

To estimate the uncertainty of the mean we evaluate the correlation 
matrix of the extracted ratios and calculate the standard error of the 
cross-correlated data. This avoids underestimation of the error caused 
by frequencies which are, owing to the linear averaging approach, used 
in multiple ratios. All measured upper ratios are shown in Fig. 3a and b. 
The full time sequence is shown in Fig. 3a, and the results projected to a 
histogram are shown in Fig. 3b. Breaks in the time sequence are due to 
maintenance of the apparatus or systematic measurements. In Fig. 3c 
and d, the power spectrum density and the Allan deviation of all 
measured data points are shown. The mean of the power spectrum 
density is constant, while a linear fit to the double-log plot of the Allan 
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Figure 3 | Results and data analysis. a, All measured antiproton-to-H — 
cyclotron frequency ratios as a function of time. Within 35 days, 6,521 frequency 
ratios were measured. Fraction, fraction of total counts (number measured in 
one bin normalized to the number of all measurements). b, Measured ratios 
projected to a histogram. c, Power spectrum density of the ratios as a function of 
reciprocal measurements. d, Allan deviation of the measured ratios including a 
fit to the data with slope —0.501(2), which confirms that the ratio fluctuations 
follow a Gaussian white noise distribution. e, Allan deviation of cyclotron 
frequency measurements. Details are discussed in the text. 


deviation gives a slope of « = —0.501(2). This confirms the Gaussian 
white-noise nature of the ratio fluctuations and justifies our data 
analysis. 


For further verification we evaluate the Allan deviation of our cyclo- 
tron frequency measurements, which is shown in Fig. 3e as the red data 
points. A fit of a linear drift of —5(1) parts per billion per hour (dashed 
line in Fig. 3e), a white-noise contribution with a root-mean-square 
width of 160(15) mHz per cycle (dash-dotted line in Fig. 3e), and a 
random walk generating a Gaussian distribution of 220(20) mHz per 
cycle (dotted line in Fig. 3e) reproduce the experimental results. From 
this model we simulate data, add random offsets 4, and blind-analyse 
the simulation data. Within error bars the offsets 4, are reproduced, 
which independently justifies our data analysis. The antiproton-to-H — 
mass ratio extracted from this data evaluation is 


Rexp = 1.001089218872(64) 


(3) 

To further check both the data analysis and the experiment we also 
evaluate the cyclotron frequency ratios for p-to-p and H -to-H~ 
instead of p-to-H , but measured within subsequent cycles. For these 
direct comparisons of identical particles 


Rexp,id — 1 = —3(79) x 1077 


(4) 
is obtained, which is consistent with 1. The increased statistical error in 
the second case is caused by the random walk in the magnetic field, 
which leads to slightly higher ratio fluctuations owing to the doubled 
time interval between subsequent measurements of identical particles. 

Several systematic corrections enter into the measured antiproton- 
to-H” charge-to-mass ratio Rexp. The dominant systematic shift is 
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related to particle exchange and the 5-mV detuning of the trapping 
voltage, which is required to tune the axial oscillation frequencies of 
both particles to the centre of the axial detector. Slightly different 
contact and offset potentials as well as machining imperfections 
are present at each individual trap electrode. Thus, the change of 
the ring voltage causes a relative shift of the antiproton-to-H  equi- 
librium position. In the presence of a magnetic gradient term of 
B, = 7.58(42) mT m | (where B, is the strength of the magnetic 
gradient) the cyclotron frequencies v,5 and y..y- are hence measured 
at slightly different magnetic fields, leading to a systematic ratio 
shift. In our measurement, the adjustment of the trapping voltage 
shifts the H™ ion towards lower magnetic fields, and R,,,, has to be 
corrected by —114(26) parts per trillion. A detailed discussion of this 
dominant systematic shift is provided in the Supplementary 
Information. 

The systematic uncertainty arises from the uncertainty in the deter- 
mination of the offset voltages and the magnetic gradient B). In addi- 
tion to the particle position, the magnitude of the octupolar correction 
C, of the trapping potential’ also changes when the trap voltage is 
adjusted. The resulting ratio correction is at —3(1) parts per trillion, 
the error being due to uncertainties extracted from potential theory 
and the determination of the axial mode temperature T, of the particle. 
Eventually, the stability of our rubidium frequency reference contri- 
butes a systematic scatter of 3 parts per trillion per ratio comparison. In 
summary, the ratio has to be corrected by —117(26) parts per trillion, 
leading to our final result 


Rexp.c = 1.001089218755(64) (26) 


(5) 
which corresponds to an antiproton-to-proton mass ratio of 
(q/m) p 
(q/m), 


and is in agreement with CPT conservation. 

In the framework of the standard model extension developed by 
ref. 29 a figure of merit 7! = (1 — Rexp,c) hVcH- /my-C is derived, 
which characterizes the sensitivity of our measurement with respect to 
the CPT-violating terms added to the standard-model Lagrange den- 
sity. Our result sets a new limit of r! <9 X 10-7”. This exceeds the 
previous limit by a factor of four, and probes the standard model at an 
energy scale of 8 atto-electronvolts. In terms of energy sensitivity our 
result is the most stringent test of CPT invariance with baryonic anti- 
matter performed so far. 

In addition to the above discussion, our high data accumulation rate 
enables the search for sidereal variations, which might be mediated by 
cosmological background fields. To this end, the data set is processed 
using lock-in filter and Allan-deviation-based data analysis. A diurnal 
variation of the results would appear as a peak in the lock-in spectrum 
at a period of the sidereal day, which is 86,164.1 s. We do not observe 
such an indication and from our data analysis we conclude that the 
amplitude of any diurnal variation in R.., is <0.72 parts per billion at 
the 0.95 confidence level. 

By following the arguments of ref. 11 our measurements can be 
interpreted as a test of weak equivalence. If the proton-to-antiproton 
charge-to-mass ratios are identical in the absence of a gravitational 
field, then the particle and antiparticle cyclotron clocks have the same 
frequencies. However, if matter respects weak equivalence while anti- 
matter experiences an anomalous coupling to the gravitational field, 
the cyclotron frequencies of the two particles will experience different 
gravitational redshifts when moved to the surface of the Earth. We 
follow ref. 11, in which a possible gravitational anomaly acting on 
antimatter is expressed by a parameter %,, which modifies the 
effective newtonian gravitational potential U to give «,U. This con- 
tributes a possible difference in the measured cyclotron frequencies 
of(Ve,p — Vep) /(Vep) = —3(% —1)U/c? Thus, our measurement sets a 


new upper limit of |zg—1| <8.7 X 10”. 


—1=1(64)(26) x 107” (6) 
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When a spatially uniform temperature change is imposed on a 
solid with more than one phase, or on a polycrystal of a single, 
non-cubic phase (showing anisotropic expansion-contraction), 
the resulting thermal strain is inhomogeneous (non-affine). 
Thermal cycling induces internal stresses, leading to structural 
and property changes that are usually deleterious. Glasses are the 
solids that form on cooling a liquid if crystallization is avoided— 
they might be considered the ultimate, uniform solids, without the 
microstructural features and defects associated with polycrystals. 
Here we explore the effects of cryogenic thermal cycling on glasses, 
specifically metallic glasses. We show that, contrary to the null 
effect expected from uniformity, thermal cycling induces rejuvena- 
tion, reaching less relaxed states of higher energy. We interpret 
these findings in the context that the dynamics in liquids become 
heterogeneous on cooling towards the glass transition’, and that 
there may be consequent heterogeneities in the resulting glasses. 
For example, the vibrational dynamics of glassy silica at long wave- 
lengths are those of an elastic continuum, but at wavelengths less 
than approximately three nanometres the vibrational dynamics are 
similar to those of a polycrystal with anisotropic grains’. Thermal 
cycling of metallic glasses is easily applied, and gives improvements 
in compressive plasticity. The fact that such effects can be achieved 
is attributed to intrinsic non-uniformity of the glass structure, 
giving a non-uniform coefficient of thermal expansion. While 
metallic glasses may be particularly suitable for thermal cycling, 
the non-affine nature of strains in glasses in general deserves 
further study, whether they are induced by applied stresses or by 
temperature change. 

In glasses there are no equivalent atoms, and elastic deformation 
must be non-affine. For metallic glasses this has been explored through 
atomistic simulations** and diffraction-based measurements”®* of the 
pair distribution function (see, for example, the review in ref. 7). 
Compared to their crystalline counterparts, metallic glasses have lower 
elastic moduli, associated with local atomic rearrangements occurring 
even well within the nominally elastic regime®. While these rearrange- 
ments (shears) contribute to apparently elastic strain, they are not 
individually reversible’, and there is net structural change. In atomistic 
simulations, shear events are localized in ~1 nm soft spots with low- 
frequency vibrational modes’, and can be associated with the range of 
density and stiffness of nearest-neighbour clusters (that is, the shell 
around a central atom)**. 

Measurements of metallic glass properties do not have such fine 
spatial resolution, but non-uniformity is still clearly detected. Strongly 
and weakly bonded regions have been postulated to account for ~5 nm 
zones of accelerated crystallization’. The initial yield load (in nano- 
indentation) shows a dispersion of values associated with structural 
heterogeneity’. Mapping of heterogeneity by atomic-force micro- 
scopy, over areas much larger than in simulation, shows variations of 


energy dissipation” and of elastic modulus (~30%)” with correlation 
lengths of 2.5-20 nm. 

In metallic glasses under elastic loading, non-affine strains lead to 
local structural change; this led us to question whether thermal strains 
show analogous effects. In elastic deformation, the effects of non-affine 
strain and modulus variation are strongest for shear. For thermal 
expansion/contraction of a glass, however, the macroscopic strain 
is hydrostatic. In simulations of amorphous iron, the local shear 
modulus G shows a relative standard deviation of 27%; the local bulk 
modulus B shows a smaller, but still substantial, value of 18%'*. This 
variation is likely to be greater for the range of bonding found in a 
multicomponent system™. As the coefficient of thermal expansion 
(CTE) is—other factors being constant—inversely proportional to B 
(ref. 15), we expect that it also shows significant variation. Noting that 
neighbouring regions of different CTE must satisfy elastic compatibil- 
ity, atomic-level shears must develop on temperature change, and 
given the existence of soft spots, local structural change may occur. 
On annealing, metallic glasses undergo thermally activated structural 
relaxation to states of higher density and lower enthalpy. To avoid such 
effects, thermal cycling in the present work is only from near room 
temperature (293-343 K) to lower temperatures, always far below 
the glass-transition temperature T, near which thermal relaxation 
occurs (Fig. 1)’*. 

As a metallic glass is heated towards Ty; there is exothermic struc- 
tural relaxation (Fig. 2a). The relaxation spectrum (heat release rate as 
a function of temperature) can be detected by differential scanning 
calorimetry (DSC). While the shape of the spectrum may be of inter- 
est’’, we focus here only on the overall heat of relaxation (AH,.), shaded 
areas in Fig. 2b). Changes in AH,.; have been used to characterize the 
effects of thermal and mechanical treatments on metallic glasses; for 
example, heavy plastic deformation (shot-peening’’) can induce age- 
ing (that is, relaxation to a lower-energy state) or rejuvenation. 
For lanthanum (La)-based metallic glasses, cycling between room 
temperature (293 K) and liquid-nitrogen temperature (77 K) leads 
to increases in AH,,), that is, rejuvenation (Fig. 2b). For melt-spun 
ribbons, AH,., peaks at ~10 cycles; for bulk glass, AH,.) is smaller, 
peaking at ~15 cycles; in each case, the maximum increase in AH; 
is ~50% relative to the as-cast sample. 

We next addressed whether such changes are due to cycling, or to 
the time spent at 77 K. Ribbons were held at 77 K for 4h, much longer 
than the sum of the times spent at 77 K in cycling treatments, but this 
‘anneal’ did not give a detectable change in AH,.) (see Methods and 
Extended Data Fig. 1). 

Because surfaces respond more rapidly than interiors, sample tem- 
perature is not spatially uniform during thermal cycling; surfaces must 
be in tension during cooling and in compression during heating. In 
nano-indentation, stress cycling can harden metallic glasses’, raising 
the question of whether the AH,,., changes (Fig. 2b) are due to the 
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Figure 1 | Thermal cycling of metallic glasses. a, At high fractions of the 
glass-transition temperature, T,, to as low as 0.6 Tp metallic glasses can 
undergo « and B relaxations, giving peaks in loss modulus as a function of T/T, 
as shown by the curve’®. b, The thermal cycling in the present work explores 
lower temperatures. Samples are cycled from near room temperature, 
293-343 K (RT), to liquid-nitrogen temperature (77 K). The ranges of T/T, 
reflect both range of temperature and the range of T, for the metallic glasses, 
from 474 K for LassNi,oAlss (at.%) up to 700 K for CuysZry¢Al,Gd,. 

c-e, Schematic depictions of the degree of heterogeneity in a metallic glass in 
the as-cast state (c), and after increasing numbers of thermal cycles (d, e). The 
population and intensity of soft spots (dark), with lower elastic stiffness and 
higher CTE, increases with cycling. The scale of these heterogeneities is not 
given by the present results (so no scale bars are given), but mapping of elastic 
properties'"’” suggests characteristic lengths of <10 nm. 


temperature cycling itself, or to the associated stresses. As-cast rods of 
LassNijoAl;5 bulk metallic glass (BMG) were subjected to room tem- 
perature-77 K cycles at the same time as thin discs cut from the rods. 
As detailed in Methods, the changes in AH, are indepen- 
dent of sample size, showing that stresses generated during cycling 
have a negligible effect (Extended Data Fig. 2). Thermal calculations 
(Extended Data Table 1) suggest that the stresses themselves are 
negligible, consistent with this conclusion. 

Many aspects of thermal cycling could be tailored: upper and lower 
temperatures, holding times at these temperatures, rates of cooling 
and heating, number of cycles. We report preliminary studies in 
Methods; our focus here, however, is on property changes induced 
by thermal cycling. 

In instrumented indentation (nano-indentation) of metallic 
glasses’®’*, initial yielding is indicated by a sharp ‘pop-in’ (increase 
in indentation depth h) on the load-h curve (Extended Data Fig. 3), 
corresponding to shear-banding onset. Cumulative distributions of 
initial yield pressure P, (Fig. 3a) for LassNizg9Abs glass ribbon in three 
states permit the effects of a longer hold at 77 K to be compared with 
cycling. The median value of P,, 2.98 GPa in the as-cast sample, 
decreases by 3% after 10-min hold at 77 K, and by a further 17% after 
ten room temperature-77 K (1 min) cycles. After the cycling treat- 
ments the relative width (1st to 9th decile) of the P, distribution 
(+15%) is greater than in the as-cast sample (+7%), suggesting that 
cycling induces greater heterogeneity. 

The corresponding distributions of hardness H (Fig. 3b), deter- 
mined from h at full load, show similar trends. The median H of the 
as-cast sample, 2.65 GPa, is 1% lower after 10 min at 77 K, and a 
further 4% lower after ten room temperature-77 K cycles. For a given 
sample, the H distribution is narrower than that for Py for example, 
after cycling, the relative width is only +3% (one-fifth that of P,). 
Evidently H, reflecting conditions for continuance of flow, is much 
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Figure 2 | Differential scanning calorimetry (DSC) of melt-spun ribbons of 
LassNi.9AL; and bulk rods of La;;Ni;9Al;5 metallic glasses. a, DSC traces 
on heating show an exotherm before the glass transition at T, (here for ribbon). 
b, By subtraction of successive traces of specific heat Cc, (see Methods), the 
heat of relaxation, AH,.), associated with the exotherm can be determined 
(shaded areas) and is increased after room temperature-77 K thermal cycles. In 
both a and b, vertical bars give units and scales for the y axes. ¢, For ribbons, 
AH,.| peaks at ~10 room temperature-77 K cycles; for discs (250-500 um 
thick) cut from the bulk rod, AH,.) is lower and peaks at ~15 cycles; the data 
points are for individual measurements and the changes lie outside the error 
range of +30J mol (standard deviation estimated as in Methods). 


less sensitive to local heterogeneities than is Py. The Young modulus, 
determined on unloading, shows similar behaviour to H (Extended 
Data Fig. 4). 

The distribution curves of P, and H confirm changes related to the 
number of cycles rather than the holding time at 77 K. The marked 
reduction in P, induced by cycling is accompanied by a reduction in 
initial yield increments (Ah, Extended Data Fig. 4a). 

Metallic glasses show exceptionally high yield stress and yield 
strain’, and can be very tough”, but there is a clear need to improve 
their plasticity. Annealing reduces their plasticity, often inducing brit- 
tleness. Figures 2 and 3 show property changes opposite to those 
expected from annealing, suggesting that thermal cycling gives reju- 
venation. The effects on macroscopic plasticity are explored for two 
BMG compositions tested in uniaxial compression (Fig. 4). The 
plastic strain of 1.5-mm-diameter samples of Zreg,Cuz4FesAly is 
improved (from 4.9% to 7.6%) by successive thermal cycles (Fig. 4a). 
Improvements in plasticity, smaller in absolute terms (for reasons 
given in Methods), but larger relatively, are seen in larger-diameter 
samples of the same glass (Fig. 4b), which also show a reduction 
in microhardness (by 4%, inset), similar to that for the median 
hardness of the La-based glass in Fig. 3b. CuygZrygAl,;Gd, BMG 
(1.5 X 1.5 X 3.0mm? cuboid) shows an increase in plastic strain from 
1.4% in the as-cast state to 5.1% after ten room temperature-77 K 
cycles (Fig. 4c). A sample fully relaxed by annealing is brittle and is 
not successfully rejuvenated by cycling, but a strong rejuvenation effect 
is possible after partial thermal relaxation, when thermal cycling gives 
a plastic strain similarly improved to that of as-cast samples. 

Well below T,, plastic flow in metallic glasses is localized into thin 
shear bands. Macroscopic plasticity is limited by premature failure on 
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Figure 3 | Cumulative distributions of (a) initial yield pressure P, and 

(b) hardness H. LassNix0AL; metallic glass ribbon is tested in instrumented 
indentation, to a maximum load of 40 mN, in three states: as-cast, after a 
10-min hold at 77 K, and after a further ten room temperature-77 K cycles each 
with 1 min hold. The cycling reduces P, and H, and widens the distribution 
of their values. The widening is taken to represent increasing heterogeneity of 
the glass structure (Fig. 1c-e). 


one or a few dominant shear bands. The plasticity can be improved by 
proliferation of shear bands, making the overall flow more uniform”’. 
The shear-band spacing of ~10 um in the as-cast sample (Fig. 4d) is 
reduced to ~2.5 1m in the thermally cycled sample, suggesting greater 
ease of shear-band initiation, consistent with reductions in P, (Fig. 3a) 
and in initial yield displacements (Extended Data Fig. 4a). Resonant 
ultrasound spectroscopy (RUS) was used to measure the elastic moduli 
of CuygZry¢Al,Gd, BMG, which show no detectable change on ther- 
mal cycling (Extended Data Fig. 5). 

Across a range of metallic glass compositions, melt-spun and bulk, 
thermal cycling induces changes in properties. As the cycling is down 
to 77K, these changes can be compared with the property improve- 
ments (hardness, wear resistance) obtained by deep cryogenic treat- 
ment (DCT) of steels**. DCT, however, works through phase changes 
(transformation of residual austenite, precipitation of fine carbides) on 
holding at low temperature. In the present case, there is no discernible 
phase change (X-ray diffraction confirms that the samples remain fully 
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Figure 4 | Improved plasticity after thermal cycling. a, 1.5-mm-diameter rod 
samples of Zrg2Cuy4Fes;Alo BMG (yield stress 1.68 + 0.05 GPa) show an 
increase in plastic strain under uniaxial compression when treated with 338 K 
to 77 K thermal cycles. b, The compressive plasticity of this glass is strongly 
dependent on sample dimension, but for all the diameters tested, thermal 
cycling leads to an increase in plastic strain, accompanied (inset) by a reduction 
in microhardness (in kg mm”; each data point is the average of 20 
measurements). c¢, 1.5 X 1.5 X 3.0 mm? cuboids of CusgZr4gAl;Gd, BMG (yield 
stress 1.61 + 0.04 GPa), both as-cast and annealed (1.0 h at 400 °C) to 
partial relaxation, show an increase in plastic strain under uniaxial compression 
after ten room temperature-77 K cycles; for full relaxation (1.5 h at 400 °C), the 
sample remains brittle. d, For the CussZrasAl;Gd; BMG samples, scanning 
electron microscopy shows that the population density of shear bands near the 
dominant shear band is lower in an as-cast sample (top) than in a similar 
sample tested after ten room temperature-77 K cycles (bottom). In a and 

c, horizontal bars indicate the scale and units for the x axes; quoted yield stress 
gives a guide to the y axes. 


glassy after cycling; Extended Data Fig. 6), and the effects are more 
analogous to those induced in single-phase polycrystalline non-cubic 
metals, for example, thermal-cycling growth in uranium” and 
increased dislocation density in zirconium~*. The widespread use of 
DCT” does, however, suggest that cycling to cryogenic temperatures 
may be a practicable process. 

The changes in AH,,) (Fig. 2c) show rejuvenation followed by age- 
ing, opposing trends known in processes such as irradiation or plastic 
deformation, which introduce both damage (raising the energy of the 
system) and mobility (enabling relaxation to lower energy). Shot- 
peening increases AH,.; in an annealed BMG, but for an as-cast 
BMG even this intense plastic deformation reduces AH, (ref. 17). 
Thermal cycling increases AH,.| even for a rapidly quenched metallic 
glass. The maximum increase in AH;e (Fig. 2c), 340J mol", is (cor- 
recting for differing DSC heating rates) ~60% of the increase in AH; 
in a Zr-based BMG induced by one rotation in high-pressure torsion”, 
and reductions in hardness (Figs 3b and 4b and ref. 26) are, pro rata, 
similar. The increases in AH, (Fig. 2c) are very similar to those seen in 
elastostatic loading”; it is remarkable that both these apparently mild 
processes give effects comparable with heavy plastic deformation, sug- 
gesting that non-affine strains in the nominally elastic regime are 
efficient in generating structural damage (disordering). 

Although continued cycling leads to reversal in the AH, increase 
(Fig. 2c), the microhardness reduction appears to saturate (Fig. 4b 
inset) while the plasticity continues to improve (Fig. 4a, b); the onset 
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of yield (Fig. 3a) is facilitated more than general yielding (Fig. 3b) and 
the effect on high-frequency (MHz in RUS) macroscopic elastic mod- 
uli is negligible (Extended Data Fig. 5). These contrasts for different 
properties lead us to suggest that thermal cycling introduces hetero- 
geneities (soft spots)* that are particularly effective in initiating flow 
and improving plasticity, with lesser effects on the average structure of 
the metallic glass. That the effects (at least for AH,.1, Fig. 2c) are greater 
in melt-spun ribbon than in BMG of similar composition suggests that 
pre-existing heterogeneity (greater in a less-relaxed metallic glass) 
helps cycling to introduce more heterogeneity; this is expected if, as 
we speculate, the effect is due to non-affine thermal strains arising 
from the heterogeneity itself. The present results, however, do not 
identify any characteristic length scale for the heterogeneity. 

Rejuvenation of metallic glasses, with improved plasticity, has 
been achieved by elastostatic loading”, ion irradiation” and plastic 
deformation”. In comparison with these methods, thermal cycling is 
attractive: it is non-destructive, not involving shape change; unlike an 
applied elastostatic stress, it is isotropic and cannot introduce aniso- 
tropy; it can be applied to any sample (thin film, ribbon or bulk) and 
repeated as necessary; it changes the whole sample (not just the surface 
as, for example, in shot-peening or ion irradiation, or only inside shear 
bands); and it is controllable, can be performed in situ, induces no 
macroscopic residual stresses, involves strains far below the elastic 
limit, and induces no macroscopic plastic flow. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Metallic glasses in this study. Sample preparation and property measurements 
took place in three laboratories. Kinetic phenomena in glasses are best compared 
on a homologous temperature scale, using the glass-transition temperature T, as 
the normalizing parameter. The selected glasses allow studies over a range of T/T,: 
for example, room temperature (293 K) covers the range of T/T, = 0.42-0.62 for 
these glasses. (The values of T, were determined using differential scanning calori- 
metry at standard heating rates of 20 and 40 K min '.) The four selected glasses are: 
(1) CuggZr4gAl,Gd, rod, 3mm diameter (T, = 700 K (ref. 30)); (2) LassNizoAhs 
(at.%) melt-spun ribbon 40 um thick (T, = 475 K); (3) LassNijoAlss rod 3 mm 
diameter (T, = 474 K); and (4) Zrg2Cuz4FesAly rods, 1.5, 2.0 or 2.5mm diameter 
(T, = 658 K). Master alloys were prepared by arc-melting 3-4N pure elements. 
Glassy ribbons, 40 jum thick, were produced by melt-spinning, under argon atmo- 
sphere, on to a single copper wheel at 4,000 r.p.m. Bulk metallic glass (BMG) rods, 
with diameter 1.5-3.0 mm and length up to 80 mm, were cast, under Ti-gettered 
argon atmosphere, into water-cooled Cu-moulds, using induction melting and 
suction casting. 

Thermal cycling. The La-based and CuygZr,gAl;Gd, samples were inserted into 
liquid nitrogen for 1 min, then heated by a hair-dryer set at room temperature, and 
held for 1 min; samples were treated with up to 25 such cycles. The Zrg2Cuz4FesAly 
samples were inserted into liquid nitrogen for 1 min, then into ethanol at 
333-343 K for 1 min; samples were treated with up to 60 such cycles. 

The paper reports only results from cycling between near room temperature 
and liquid-nitrogen temperature (77K). Keeping the upper temperature fixed 
(near room temperature), it is expected that there would be an optimum lower 
temperature for maximum effect of thermal cycling (at given cooling and heating 
rates, and hold times at the upper and lower temperatures): for higher tempera- 
tures closer to room temperature, there would be less driving force for structural 
change; for lower temperatures further from room temperature, the rate of struc- 
tural change near the lower temperature would be lower. 

In addition to cycling down to 77 K, melt-spun ribbons of LassNizoAls glass 
were cycled down to dry-ice (CO) temperature (195 K) and to liquid-helium 
temperature (4.2K) to explore the role of the lower temperature. The state of 
the samples was assessed using DSC (with heating rates from 20 to 40 K min”’). 
For a given type of sample, the heat of relaxation AH, is lower for higher heating 
rate; it is therefore appropriate to normalize the AH,.) values relative to that for the 
as-cast glass measured at the same DSC heating rate. Measurements were made for 
samples: (1) held at 195 K for 100 min, for which the ratio of AH,,; to that of the as- 
cast glass is 1.23; (2) treated with ten cycles of 1 min at 195 K, for which the ratio is 
1.24; (3) treated with ten cycles of 10 min at 195 K (ratio = 1.02); (4) treated with 
ten cycles of 1 min at 77 K (ratio = 1.55); and (5) treated with 13 cycles to 4.2K 
(ratio is 1.37). Given the low number of samples tested, there is significant uncer- 
tainty in these relative values (+15%); nevertheless, it seems that, within the 
limited range of these tests, cycling down to 77 K produces the strongest effect. 

A Ti-based BMG cycled up to 400 times between 77 K and 423 K (to simulate 

conditions on the surface of a spacecraft in low earth orbit) showed slight relaxa- 
tion rather than rejuvenation’. This suggests that there is a significant annealing 
effect during the holds (25 min each) at the upper temperature, which itself is 
higher than in our work (and corresponds to T/T, = 0.68, inside the range assoc- 
iated with f relaxation, see Fig. 1a in main text). 
Stresses associated with thermal cycling. As noted in the paper, it is important to 
understand whether the effects of thermal cycling are due to the temperature 
changes themselves or to stresses arising from non-uniform temperature in the 
samples during cooling and heating. Whether the temperature is significantly non- 
uniform can be assessed using the dimensionless Biot number, 


_ AL 
Kk 


Bi (1) 
where h is the heat-transfer coefficient between the sample and the surrounding 
medium, L is a characteristic sample dimension (usually taken to be the ratio of 
sample volume to surface area), and x is the thermal conductivity of the sample 
material. For a flat plate, L is the half-thickness; for a cylinder the half-radius. If 
Bi < 0.1, the temperature differences within the sample are negligible compared to 
the step in temperature between the sample and its surroundings (Newtonian 
cooling). Conversely, if Bi> 1, there are significant temperature gradients in the 
sample. We examine the conditions for samples immersed in liquid nitrogen. 
Cooling in liquid nitrogen has been widely studied, particularly in connection 
with cryopreservation*. The ‘French straw’ sample containers used are cylinders 
of 2.8 mm diameter, and so are very similar in dimension to the cast BMG rods in 
the current work. On first immersion, there is film boiling, in which a film of 
vapour limits the heat transfer; in this regime*? h = 148 Wm 7s _'. As the sample 
cools, there is a transition to the nucleate-boiling regime* in which h = 1,355 W 


m *s '. The transition point has been measured for a variety of metallic samples 


of geometry comparable to the samples in the present work”; it occurs at tem- 
peratures ranging from 150 to 100 K. Thus a room-temperature sample immersed 
in liquid nitrogen spends around 75% of the temperature range down to 77 K with 
the relatively poor heat transfer associated with the film-boiling regime. 

We used materials parameters determined for Vitreloy 1; Zr-based BMGs of 
this kind are among those for which thermal-cycling treatments might be 
put into practice. For Vitreloy 1, the thermal conductivity decreases with decreas- 
ing temperature® and starts to level off below room temperature. At 300K, 
K=459WK !'m |; weusea rough estimate of « = 4WK 'm | for the tem- 
perature range in the present work. Extended Data Table 1 gives the estimates of Bi 
for the various sample geometries of interest. 

We can see that for nearly all of these cases Bi < 0.1, and therefore temperature 
gradients (and thermal stresses) in the samples are negligible. Only for the largest- 
diameter cylinders in the final (nucleate boiling) stages of cooling is there any 
significant deviation from Newtonian cooling. These calculations support the 
experimental work (Extended Data Fig. 2 and associated discussion, see below) 
suggesting that sample size is not a significant factor in the thermal-cycling effect, 
and reinforce the conclusion that the effect is due to the temperature changes 
themselves and not due to any stresses arising from non-uniform temperature 
profiles in samples during cooling and heating. 

It is of interest to consider the possible extent of thermal stresses associated 

with non-uniform sample temperature. Taking the value** determined for the 
Zr-based BMG Vitreloy 106, the linear coefficient of thermal expansion is 
a = 8.7 X 10 °K |. Foraroom-temperature sample immersed in liquid nitrogen, 
the maximum possible temperature difference during cooling is between 293 K 
(room temperature, possible at the sample centre) and 77 K (possible at the sample 
surface). For this temperature difference, the linear thermal strain is 0.19%, which 
would give internal stresses of the order of 10% of the macroscopic yield stress. In 
practice, as highlighted by the low values for the Biot number in Extended Data 
Table 1, any thermal stresses would be much lower than this value. Rejuvenation of 
metallic glasses by elastostatic loading”* typically involves much higher stresses (up 
to 95% of the yield stress) for much longer times (tens of hours, rather than a few 
minutes). Again, it seems that stress effects in the current thermal cycling should 
be negligible. 
Differential scanning calorimetry. This was performed (as in Fig. 2) using a 
Q2000 DSC (TA Instruments). Samples were heated at 20K min! from room 
temperature to 503 K (into the supercooled liquid state), held for 2 min, then 
cooled to room temperature at 20K min™’. A second cycle using the same pro- 
cedure was used as the baseline for subtraction from the first cycle. The heat of 
relaxation AH, is calculated from the area between the two curves, from the onset 
of relaxation to the glass transition*’. Additional data were acquired using a Perkin 
Elmer DSC 8500 at a heating rate of 40 K min’. 

Extended Data Figure 1a shows examples of DSC traces used to assess the 
reproducibility of measurements of the heat of relaxation AH;ei (using the base- 
line-subtraction method outlined above). Six samples of as-cast ribbons were 
tested, giving an average value of AH; = 740] mol !, with a standard deviation 
of 30J mol"! (that is, 4%). 

As noted in the paper, it is important to know whether the effects of thermal 
cycling between near room temperature and 77 K are due just to the time spent at 
77 K. Samples of LassNizoAl)s glassy ribbons were held at 77 K for 4h (a time much 
longer than the total spent at 77 K in the course of ten room temperature-77 K 
cycles). Within experimental error (Extended Data Fig. 1b), such holds at 77 K 
appear to have no effect on AH,.). Of course, a hold at 77 K represents a single 
cycle. As shown by the nano-indentation results in Fig. 3 in the main paper, even 
one cycle can have a measurable effect on properties. In the present case of AH,.1, 
however, any change as a result of one cycle is within the experimental uncertainty. 

As noted in the main paper and discussed above in the Methods, it is necessary 
to examine whether the effects of thermal cycling are dependent on sample size. 
The non-uniform temperature in a larger sample could give rise to stresses and the 
cycling of those stresses could have effects. As-cast rods (3mm diameter) of 
LassNiioAls5 BMG were subjected to room temperature-77 K cycles at the same 
time as thin (250-500 um thick) discs pre-cut from the rods. After cycling, discs 
were cut from the centres of the bulk rods, so that samples of the same glass and 
same DSC-sample size could be compared after thermal cycling in two different 
geometries (thin disc and bulk). Extended Data Figure 2 shows that the changes in 
AH,.1 are essentially the same for the samples thermally cycled in disc and in bulk 
form, suggesting that the stresses generated during cycling have a negligible effect. 
The two sets of data are combined in Fig. 2c. 

Instrumented indentation. This was performed using an XP Nanoindenter 
(MTS Systems Corp.) at room temperature under load control up to a maximum 
load (Fax) of 40 mN. A diamond spherical indenter of tip radius 8.074 um was 
used, its area function calibrated using fused silica and sapphire standards. 
Loading and unloading rates were 0.5 mN s'; the drift rate was maintained below 
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0.07nm $7. Samples were loaded to Fax, then unloaded to 0.05Fmax and main- 
tained at that load for 60 s for measurement of thermal drift. The cumulative 
distribution curves (Fig. 3) include 39-48 data points. Initial yield events 
(pop-ins) were identified directly from the F-h curve and the associated spike 
in indenter tip velocity, or from the first deviation from the Hertzian elastic 
solution fitted to the F-h curve. Hardness was determined by the standard 
Oliver and Pharr method”, on unloading at Frax- 

Tests were conducted on melt-spun ribbons of Las;NizAls glass under the 
conditions noted above. Typical load-displacement curves (Extended Data Fig. 3) 
show pop-ins (that is, the transition from purely elastic to elastic-plastic deforma- 
tion) with a moderate sharpness that is not affected by thermal cycling. The pop- 
ins are characterized by the values of initial yield load F, and initial yield displace- 
ment Ah, Initial yield pressure P, is calculated from: 


F, 
P= 2 
—e (2) 
y 
where ay is the contact radius at initial yield. The reduced or indentation modulus 


E, is determined by the standard method**”’, on unloading from Fax. E; is also 
determined by using the Hertzian elastic solution to fit the shape of the load- 
displacement curve up to the first pop-in. E, is given by”*: 


1 (l-v?)  (l—v’) 
E, E E (3) 


where E and v are, respectively, the Young modulus and Poisson ratio of the glass, 
while the corresponding values** for the diamond indenter tip are taken to be 
E, = 1,141 GPa and v; = 0.07. Using equation (3), values of E were obtained. 

Tests were conducted on ribbon samples in three states: (1) as-cast; (2) after one 
cycle from room temperature to a 10-min hold 77 K and back to room temper- 
ature; and (3) after a further ten cycles to 77 K each with 1 min hold. The effect ofa 
longer hold at 77 K can thus be compared with cycling. All metallic-glass samples, 
as-cast or treated, are found to have some variation in properties over length scales 
of millimetres. For this reason it is important to compare data taken from a given 
region of the sample surface. In the present work, indentations were made in the 
same region (within 200 um) of the given sample surface in the three successive 
states. Extended Data Table 2 collects data on the instrumented-indentation tests 
in the present work. Cumulative distributions, such as those shown in Fig. 3, are 
characterized by the median value and by the Ist and 9th deciles. 

Extended Data Figure 4a shows the distribution of values of initial yield pressure 
P, and initial yield displacement Ah for the three states of the LassNizoAlys glass 
ribbons. On cycling, both P, and Ah are reduced. There is a correlation, as 
expected, between higher values of Py and higher values of Ah. The spread in 
values of Ah is particularly wide. After ten room temperature-77 K cycles, there 
is a marked reduction in the incidence of larger values: the initial yield events are 
smaller, reflecting less catastrophic shear banding. 

The Young’s modulus of the glass E determined by unloading from full load 
(Extended Data Fig. 4b) shows similar effects to those found for hardness H 
(Fig. 3b). The median value of Eynioaa in the as-cast sample decreases by 0.6% 
after 10 min hold at 77 K (compared to a 0.9% decrease in H), and by a further 7% 
after ten room temperature-77 K (1 min) cycles (compared to 4% for H). After 
the cycling treatments the relative width (1st to 9th decile) of the E distribution 
(+2.4%, compared to +3.1% for H) is greater than in the as-cast sample (+1.4%, 
compared to +1.3% for H). 

The Young’s modulus determined from the Hertzian fit of the F-h curve up to 
the first pop-in (Extended Data Fig. 4c) has a median value ~5% lower than that 
determined by unloading. The median value of Eyjertz in the as-cast sample 
decreases by 2.4% after 10-min hold at 77 K, and by a further 8.8% after ten room 
temperature-77 K (1 min) cycles. The relative width of the Eye, distribution in 
the as-cast sample (+1.6%) is similar to that for Euntoaa, but this width increases 
more (to +7.3%) after the cycling treatments. This suggests that the elastic prop- 
erties up to the first pop-in are more sensitive to local variations in the sample. 
Compression tests. These were made on Instron 5581 and Hounsfield 25 kN 
machines, loading at a strain rate of 5 X 10 *s"'. The Zrg2Cuz4FesAly samples 
were cylindrical rods of 2:1 height:diameter, with diameter of 1.5, 2.0 or 2.5mm. 
The CuyZrssAl;Gd, samples were 1.5 X 1.5 X 3.0 mm? cuboids cut from the 
original cast rod of 3 mm diameter. All samples were polished to a mirror finish. 
The gradients of the elastic portions of the stress-strain curves in Fig. 4a are 
affected by the machine compliance and underestimate the Young’s modulus. 
The curves are used only to extract values for the plastic strain as shown. The 
curves in Fig. 4c are almost fully corrected for machine compliance effects. 

The curves in Fig. 4a and Fig. 4c show engineering stress and engineering strain. 
The strain values are calculated from the displacement of the testing machine and 
not from strain gauges on the samples. The gradients of the elastic portions of 
the curves are therefore affected by the machine compliance and significantly 
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underestimate the Young’s modulus of the materials. The curves are used only 
to extract values for the plastic strain before catastrophic failure. Figure 4b collects 
data for rod samples of Zrg2Cuz4Fes;Alp BMG of three diameters: 1.5 mm, 2.0mm 
and 2.5 mm, and corresponding heights 3.0 mm, 4.0 mm and 5.0 mm. The stress- 
train curves for 1.5 mm samples are shown in Fig. 4a; those for 2.0 mm and 2.5 mm 
samples in Extended Data Fig. 7. 

As has been much discussed”'””, larger samples are more susceptible to cata- 
strophic failure through shear localization. Such size effects are extrinsic, and relate 
only to the mechanical tests themselves. In addition, there is an intrinsic effect: 
larger samples have a lower cooling rate during casting and so are more relaxed 
and tend to be more brittle in the as-cast state. 

Resonant ultrasound spectroscopy. RUS measurements were conducted as in 
earlier work*°“', Measurements were at room temperature on 1.5 X 1.5 X 3.0mm? 
cuboids cut from the as-cast 3-mm-diameter rod of CuyZr4gAl;Gd, BMG. For 
each sample, a resonance spectrum was collected in the range 0.05-2.0 MHz with 
65,000 data points. To detect changes in shear and bulk modulus, 10,000 data 
points were collected in each case in the ranges 0.900-0.910 MHz and 0.978-0.982 
MHz, respectively. Within experimental error (less than +0.1% for the shear 
modulus), room temperature-77 K thermal cycling has no effect on the elastic 
moduli of the glass (Extended Data Fig. 5). This suggests that the reductions 
in Young’s modulus (Extended Data Fig. 4b, c) and correspondingly hardness 
(Fig. 3b) seen under quasi-static conditions are largely attributable to anelastic 
(that is, time-dependent elastic) strains. 

Dynamic mechanical analysis. This technique (not discussed in the paper), per- 
mitting measurement of the loss and storage moduli of a material, was applied 
(using methods reported previously”) to ribbons of LassNiygAls glass (Extended 
Data Fig. 8) to explore the nature of the rejuvenation effect. The loss modulus 
shows a sharp maximum near the glass transition, associated with « relaxation. 
After ten room temperature-77 K cycles, the maximum is lowered by 2.3 K. This 
earlier onset of softening on heating is consistent with a rejuvenated state. 
The broad maximum at lower temperature (centred around 115°C, Extended 
Data Fig. 8) shows f relaxation. The B relaxation in metallic glasses has been 
linked to mechanical properties**“’. In view of the improved plasticity seen after 
thermal cycling (Fig. 4 and Extended Data Fig. 7), it is surprising that the effect of 
thermal cycling on the B relaxation is so slight (inset in Extended Data Fig. 8), 
especially compared to the clear effect on the o relaxation. 

X-ray diffraction. There are reports that under cyclic mechanical loading at room 
temperature, in the elastic regime, a Zr-based BMG can undergo some crystalliza- 
tion’. Deep cryogenic treatment of steels certainly induces phase changes”*”’. In 
the present work, X-ray diffraction (Bragg-Brentano geometry, Bruker D8 instru- 
ment, Cu Ke radiation) was used to check that the samples remained fully glassy 
throughout, and especially after thermal cycling. Representative diffractograms for 
ribbon and bulk samples are shown in Extended Data Fig. 6. We found no evidence 
for any crystallization induced by the thermal-cycling treatments. The property 
changes suggest that the cycling does induce changes in the glassy structure but, 
typical of such structural relaxation, these changes are too subtle to be detectable 
with the simple X-ray methods used in the present work. 

Other methods. Microhardness was measured on an Akashi MVK-HVL hardness 
testing machine, with 20 measurements for each data point. Scanning electron 
microscopy (secondary-electron imaging) was performed using a JEOL 5200 
instrument. 
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Extended Data Figure 1 | DSC traces for ribbons of LassNizoAb; glass 
(heated at 20 K min‘’). a, These example traces for as-cast samples show the 
exotherm, just below the glass-transition temperature, from which the heat 
of relaxation AH,.; is determined. b, The effect of holding at 77 K: the traces 
show that the heat of relaxation AH,.1, given by the exotherm just below 

the glass-transition temperature, is, within the experimental error of 4%, the 
same for an as-cast ribbon, and for a sample held for 4 h at 77 K. 
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Extended Data Figure 2 | Differential scanning calorimetry (DSC) of 
LassNi;oAl; bulk metallic glass. The heat of relaxation AH,,, is compared for 
discs (250-500 jim thick) pre-cut from the bulk rod and then treated with room 
temperature-77 K cycles, and for discs post-cut from treated rod samples. 
There is no difference between these cases within the error of +30J mol”. 
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Extended Data Figure 3 | Load-displacement curves for ribbons of 
LassNi.9AL; metallic glass tested up to a maximum load F,,,,, of 40 mN. The 
initial yield load F, is indicated on the curves for the glass in three states: as-cast, 
after a 10-min hold at 77 K, and after a further ten room temperature-77 K 
cycles each with 1 min hold. 
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Extended Data Figure 4 | Instrumented indentation of Lass;Niz9Abs 
metallic glass ribbon. a, Initial yielding, characterized by initial yield pressure 
P, and initial yield displacement Ah. Indentations are made for three states of 
the sample: as-cast, after a 10-min hold at 77 K, and after a further ten room 
temperature-77 K cycles each with 1-min hold. b, c, Distributions of the 
Young’s modulus, using data from the same indentations of the sample in three 
states as in Fig. 3 and in a. Values of the Young’s modulus in the glass E are 
determined (b) by the standard Oliver and Pharr*’ method on unloading from 
Fyax and (c) from Hertzian fitting to the F-h curve up to the first pop-in. 
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Extended Data Figure 5 | Elastic moduli derived from resonant ultrasound 
spectroscopy for Cuy¢Zr4gAl7;Gd, BMG treated with room temperature-77 


K thermal cycles. Shear modulus (left axis); bulk modulus (right axis). 
Error bars, root mean square errors in the fitted frequencies. 
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Extended Data Figure 6 | X-ray diffraction traces for Zr-based metallic 
glasses subjected to 338-77 K thermal cycles. a, For melt-spun ribbons of 


Zr61,1CUz6,3F 2.1 Aljo,s; b, for rods of Zrgz 2Cu3,9Fe4 gAly 1. No clear changes are 
induced by the cycling treatments. 
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Extended Data Figure 7 | Compressive stress-strain curves for rod samples 
of Zrg2Cuz4Fe;Aly BMG. a, For rods of 2 mm diameter; b, for rods of 2.5mm 
diameter. In each case, increasing numbers of 338 K to 77 K thermal cycles 
cause the plastic strain to increase. 
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Extended Data Figure 8 | Dynamic mechanical analysis of ribbons of 
LassNizoAl; metallic glass. The heating rate is 3 K min’. The general form of 
the curve matches that shown in Fig. 1a, where an example was chosen of a 
metallic glass showing relaxation at a particularly low value of T/T,. For 
LassNigoAbhs the B relaxation is centred at T/T, ~ 0.8. 
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Extended Data Table 1 | Values of the Biot number Bi for immersion in liquid nitrogen of the various sample geometries in the present work 


Sample geometry L (um) Bi, film boiling Bi, nucleate boiling 
melt-spun ribbon, 40 um thick 20 7.4x10" 6.8x10" 

disc, 250-500 um thick 125-250 (4.6-9.2)x10~ (4.2-8.5)x10” 
cylinder, 1.5 mm diam. 375 1.4x107 0.13 
cylinder, 2.0 mm diam. 500 1.8x107 0.17 
cylinder, 2.5 mm diam. 625 2.3x107 0.21 
cylinder, 3.0 mm diam. 750 2.8x107 0.26 


The film-boiling regime applies until the sample surface temperature decreases to between 150 K and 100 K, and the nucleate-boiling regime applies thereafter. 
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Extended Data Table 2 


Summary of instrumented-indentation tests 


State as-cast 10 min LN | 10 min & 10 cycles LN 
Property 10% median 90% 10% median 90% | 10% median 90% 
No. of indents 48 39 47 
F,(mN) £0.1 106 143 «4160 106 135 143 58 93 115 
Ah(nm)#0.2 51 82 114 36 68 107 32 45 65 
p,(GPa)+0.015 269 298 312 261 289 296 197 238 2.66 
fimax (tim) + 1 442~«445—=«SsiSiiSsCS SBC 
H(GPa)#0.003 2.607 2.647 2678 | 2576 2.622 2672 | 2.450 2.518 2.607 
Exyoos(GPa)+0.04 | 43.52 44.29 44,79 | 43.47 44.04 44.47 | 39.65 40.93 41.59 
Even (GPa)+0.04 41.95 42.65 43.33 | 4083 4164 42.22 | 34.84 37.90 40.35. 


Cumulative distributions, such as those shown in Fig. 3, are characterized by the median value and by the first and ninth deciles. 
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Graphene kirigami 


Melina K. Blees', Arthur W. Barnard’, Peter A. Rose, Samantha P. Roberts’, Kathryn L. McGill’, Pinshane Y. Huang”, 
Alexander R. Ruyack®, Joshua W. Kevek', Bryce Kobrin’, David A. Muller?* & Paul L. McEuen>* 


For centuries, practitioners of origami (‘ori’, fold; ‘kami’, paper) 
and kirigami (‘kiru’, cut) have fashioned sheets of paper into beau- 
tiful and complex three-dimensional structures. Both techniques 
are scalable, and scientists and engineers are adapting them to 
different two-dimensional starting materials to create structures 
from the macro- to the microscale’. Here we show that 
graphene’ © is well suited for kirigami, allowing us to build robust 
microscale structures with tunable mechanical properties. The 
material parameter crucial for kirigami is the Foppl-von 
Karman number’® y: an indication of the ratio between in-plane 
stiffness and out-of-plane bending stiffness, with high numbers 
corresponding to membranes that more easily bend and crumple 
than they stretch and shear. To determine y, we measure the 
bending stiffness of graphene monolayers that are 10-100 micro- 
metres in size and obtain a value that is thousands of times higher 
than the predicted atomic-scale bending stiffness. Interferometric 
imaging attributes this finding to ripples in the membrane”'’ 
that stiffen the graphene sheets considerably, to the extent that 


y is comparable to that of a standard piece of paper. We may 
therefore apply ideas from kirigami to graphene sheets to build 
mechanical metamaterials such as stretchable electrodes, springs, 
and hinges. These results establish graphene kirigami as a simple 
yet powerful and customizable approach for fashioning one- 
atom-thick graphene sheets into resilient and movable parts with 
microscale dimensions. 

Devices such as those shown in Fig. la, c, d are made from poly- 
crystalline monolayer graphene that is grown on copper by chemical 
vapour deposition”, and then transferred to fused silica wafers that are 
covered with an aluminium release layer. We use optical lithography to 
pattern both the graphene and the 50-nm-thick gold pads that are 
deposited on top of the graphene to act as handles. Finally, we release 
the graphene from the surface by etching away the aluminium in mild 
acid. The devices remain in aqueous solution with added salts or 
surfactants as desired. An inverted white-light microscope with a video 
camera is used to image the sheets, and micromanipulators are used to 
probe them. 


Figure 1 | Patterning and manipulating 
graphene. a, Transmission white-light image 
showing completed devices: a spiral spring, a 
kirigami pyramid, and a variety of cantilevers. 
b, Manipulating a large sheet of graphene with a 
micromanipulator. The sheet folds and crumples 
like soft paper, and returns to its original shape. 
c, d, Manipulating devices with gold pads. The 
devices can be lifted entirely off the surface; see 
Supplementary Video 1. Scale bars are 10 jum. 
All images and videos have undergone linear 
contrast adjustments. 
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We move the graphene along the surface or peel it up entirely by 
pushing a sharp probe tip into the gold pads or against the graphene 
itself (Fig. lb-d and Supplementary Video 1). The graphene’s elastic 
behaviour is reminiscent of that of thin paper: it folds and crumples out 
of plane, but does not notably stretch in plane (Fig. 1b). The process is 
almost entirely reversible in the presence of surfactants, even after 
considerable crumpling of the graphene. 

The mechanical properties relevant for kirigami are captured by the 
Féppl-von Karman number’® for a square sheet of side length L and 
thickness t: y = YypL’/x ~ (L/t)’, that is, the ratio between the two- 
dimensional Young’s modulus Y>p and the out-of-plane bending stiff- 
ness K, multiplied by the length squared. To determine 1), we measure 
x by using the photon pressure from an infrared laser to apply a known 
force to a pad attached to a graphene cantilever and measuring the 
resulting displacement (Fig. 2a). We also measure thermal fluctuations 
of cantilevers to determine their spring constants (Fig. 2b and 
Extended Data Fig. 4), which, according to the equipartition theorem, 
arek=kgT/ (xA,)s where T is temperature, kg is Boltzmann’s constant, 
and (x;,) is the time-averaged square of the cantilever thermal fluc- 
tuation amplitude. Although the presence of water (the aqueous solu- 
tion in which the device is immersed) slows down the fluctuations, it 
does not change the spring constant’*’®. Cantilevers with lengths of 
8-80 ttm and widths of 2-15 1m have spring constants of 10° °- 
10 *Nm ‘. These are astonishingly soft springs, as many as eight 
orders of magnitude softer than a typical atomic force microscope 
cantilever. The bending stiffness x is inferred from the measured 
spring constant using k= 3«W/L*, where W and L are the width 
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Figure 2 | Measuring the bending stiffness of monolayer graphene. 

a, Applying controlled forces to a gold pad using an infrared laser. The grey 
triangle represents the probe tip that holds the device up off the surface; the red 
triangle represents the focused laser beam. The cantilever displacement gives 
the spring constant. b, Tracking the motion of a rotated device under thermal 
fluctuations provides an independent measurement of the spring constant 
(Extended Data Fig. 3). c, Stacked histogram of bending stiffness, and 
interference micrographs of devices whose aluminium release layer has been 
etched away, showing the structure of static ripples (inset). The spring constant 
relates to the bending stiffness as k = 3xW/L*. The red arrow points to the 
microscopic bending stiffness, 9 = 1.2 eV. Scale bars are 10 um. Interference 
images were averaged over 180 frames at 90 frames per second. 
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and length of the cantilevers, respectively. The values obtained from 
these thermal measurements and the laser measurements are shown in 
Fig. 2c, and are seen to be orders of magnitude higher than kp = 1.2 eV, 
which is the value that is predicted from the microscopic bending 
stiffness of graphene (known from simulations’” and measurements 
of the phonon modes in graphite’). 

Both thermal fluctuations and static ripples are predicted to notably 
stiffen ultrathin crystalline membranes”'*””° by effectively thick- 
ening the membrane, similar to how a crumpled sheet of paper is more 
rigid than a flat one. For static ripples, the effective bending stiffness is 
predicted to be? Kere/ io = \/ Yop (Z:¢) /io. Where (z2--) is the space- 
averaged square of the effective amplitude of the static ripples and 
Yop = 340Nm ” is the two-dimensional Young’s modulus’. For an 
initially flat membrane with thermal fluctuations, the stiffness is pre- 
dicted to be Ker = Ko(W/I.)" , where I. = \/32772/(3YopkgT) is the 
Ginzburg length”, and n is a scaling exponent. 

We look for static ripples in graphene cantilevers using interference 
microscopy” (inset of Fig. 2c). The black bands in such images are 
regions of constant elevation, with the spacing between black and 
white bands corresponding to changes in z of 1/4 ~ 100 nm (where 
/ is the wavelength, corrected for the refractive index of water). With a 
typical (z2,-) value from these measurements of about (100 nm)’, we 
obtain an effective bending stiffness of Ke /Ko ~ 4, 000. 

Static ripples are present only after releasing graphene from the 
surface (Extended Data Fig. 2), and likely to be sample specific and 
influenced by growth, fabrication details, and so on. Developing 
growth and fabrication protocols that can change the amplitude of 
the static ripples or eliminate them altogether is of great interest. 
Other groups have observed ripples in suspended (strained) graphene 
membranes”, although they occur at a much smaller scale and their 
origin remains a subject of debate. Moreover, the thermal theory 
outlined above predicts a bending stiffness at room temperature due 
to thermal fluctuations of eg /io ~ 1,000 for an initially flat mem- 
brane. These contradictory findings call for future experiments to 
firmly establish the relative contribution to bending stiffness of 
thermal fluctuations and static ripples”. But irrespective of cause, 
the high bending stiffness notably changes the effective y value, 
Yerr= YerL/Keg. With the predicted renormalization’! of Y.g, we 
find that Yer is of the order of 10°-10’ for a sheet of graphene 
10 tum X 10 1m in size, close to that of a standard sheet of paper. 

The mechanical similarity between graphene and paper makes it 
easy to translate ideas and intuition directly from paper models to 
graphene devices. For example, the highly stretchable graphene tran- 
sistors in Fig. 3b, c and Supplementary Video 2 are based on a simple 
kirigami pattern of alternating, offset cuts and are created using photo- 
lithography. Here, the elasticity of the kirigami spring is determined by 
the pattern of cuts and the bending stiffness (rather than the Young’s 
modulus) of the graphene. As the reconstruction of the three-dimen- 
sional shape of a stretched and lifted device in Fig. 3e shows, the 
graphene strips pop up and bend out of plane as the spring is stretched. 

We measure the electrical response of these stretchable transistors 
by gating them with an approximately 10mM KCI solution” 
(see Methods for details). Figure 3d plots the liquid-gate dependence 
of the conductance at a source-drain bias of 100 mV for a device in its 
initial unstretched state (blue) and when stretched by 240% (orange). 
The normalized change in conductance with gate voltage per graphene 
square is 0.7mSV_' and the resistance per graphene square at the 
Dirac point is 12 kQ, comparable to what has been reported for elec- 
trolyte-gated graphene transistors™*. Because the graphene lattice itself 
is not much strained when the kirigami spring is extended, we do not 
expect or observe a notable change in the conductance curves between 
the unstretched and stretched states, which is highly desirable for 
stretchable electronics”. Furthermore, stretching and unstretching a 
similar device more than 1,000 times did not substantially change its 
electrical properties. 
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Figure 3 | Stretchable graphene transistors. 

a, b, Paper and graphene in-plane kirigami springs, 
respectively. c, Graphene spring stretched by 
about 70%. d, Electrical properties in 
approximately 10 mM KCl. Conductance G is 
plotted against liquid-gate voltage V;g at source- 
drain bias Vsp = 100 mV before stretching (blue) 
and when stretched by 240% (orange). The top 
(orange-boxed) inset is split because the stretched 
device was larger than the visible area. e, Three- 
dimensional reconstruction from a z-scan focal 


— 0% strain 


Figure 4a and Supplementary Video 3 show graphene cut so that it 
forms an out-of-plane pyramidal spring, along with a paper model. The 
spring’s force-distance curve in Fig. 4b, measured with the photon 
pressure from an infrared laser focused on the central pad, gives a spring 
constant of k= 2 10-°Nm_'. This value compares well with the 
spring constant estimate of (5 <x 10 ”)-(5 X 10°°)Nm |, obtained 
from our kK measurements (Fig. 2) and the geometry of the device. 


4 12 20 


Displacement (4m) 


b Estimated laser force (pN) 


series of a graphene spring. The right side remains 
stuck to the surface and the left side is lifted. 
Insets show views of sections of the graphene 
(right) and paper models (left). Top images show 
side views; bottom images show top views. The thin 
grey lines are the bounding box from the three- 
dimensional reconstruction. The aspect ratio of the 
side-view paper model was compressed 1.8X. 
Scale bars are 10 um. See Supplementary Video 2. 
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Remote actuation of the kirigami devices is possible using magnetic 
fields or linked graphene elements. Figure 4c illustrates magnetic actu- 
ation, with magnetic forces and torques acting on an attached iron pad 
allowing for parallel (many hinges being controlled simultaneously) 
and complex manipulations to be made. The opening and closing of 
the graphene hinge (which is 1 um long and 10 um wide) in Fig. 4d 
uses a longer graphene strip (out of focus in the image) that extends in 


Figure 4 | Remote actuation. a, Paper model 
(top) and as-fabricated graphene kirigami pyramid 
(bottom). We actuate this out-of-plane spring 
using an infrared laser. b, Schematic and force- 
distance curve for a pyramid such as the one shown 
in a. A linear fit at low forces yields 

k=2X10 °Nm ‘.c,A rotating static magnetic 
field twists and untwists a long strip of graphene. 
The gold pad is replaced by iron. d, A monolayer 
graphene hinge actuated by a graphene arm. 

Stills show the hinge closing. Supplementary Video 
3 was taken after opening and closing this hinge 
1,000 times; it survived 10,000 cycles. Scale bars are 
10 pum. 
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a loop over the hinge to a gold pad, so that moving this probe along the 
surface opens and closes the hinge (Supplementary Video 3). Although 
the hinge is only one atom thick, it survived more than 10,000 open- 
and-close cycles (at which point the gold pads started to warp). This 
remarkable resilience and the scope for scaling down to tens of nano- 
metres make monolayer graphene hinges ideally suited for use in 
microscale moving parts. 

We envisage that graphene kirigami will have many useful applica- 
tions. For example, springs like those in Figs 3 and 4 are easily 
designed, with spring constants that range from 1Nm' to 
10 *Nm | (which covers the full range from atomic force micro- 
scopes to optical traps), for use as force measurement devices with a 
simple visual readout and femtonewton force resolution. The addition 
of elements such as bimorphs” or chemical tags to graphene kirigami 
devices” will create environment-responsive metamaterials. Kirigami 
techniques can also easily be applied to other two-dimensional mate- 
rials that have different optical, electronic, and mechanical properties, 
which creates opportunities for further development of self-actuated 
two-dimensional functional devices that respond to light or magnetic 
fields, changes in temperature, or chemical signals. Such atomically 
thin membrane devices may be used for sensing, manipulation, com- 
plex origami, and nanoscale robotics. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 


Received 10 March; accepted 14 May 2015. 
Published online 29 July 2015. 


1. Wang-lverson, P., Lang, R. J. & Yim, M. (eds) Origami 5: Fifth International Meeting of 
Origami Science, Mathematics, and Education (CRC Press, 2011). 

2. Hawkes, E. et al. Programmable matter by folding. Proc. Natl Acad. Sci. USA 107, 
12441-12445 (2010). 

3. Lee, C., Wei, X., Kysar, J. W. & Hone, J. Measurement of the elastic properties and 

intrinsic strength of monolayer graphene. Science 321, 385-388 (2008). 

4. Booth, T. J. et al. Macroscopic graphene membranes and their extraordinary 

stiffness. Nano Lett 8, 2442-2446 (2008). 

5. eyer, J.C. etal. The structure of suspended graphene sheets. Nature 446, 60-63 

(2007). 

6. Bunch, J. S. et al. Electromechanical resonators from graphene sheets. Science 

315, 490-493 (2007). 

7. Foppl, A. Vorlesungen liber technische Mechanik (B. G. Teubner, 1905). 

8. von Karman, T. Festigkeitsproblem im Maschinenbau. Vol. 4 (Encyklopadie der 

athematischen Wissenschaften, 1910). 

9. Kosmrlj, A. & Nelson, D. R. Mechanical properties of warped membranes. Phys. 

Rev. E 88, 012136 (2013). 

0. Nelson, D. R. & Peliti, L. Fluctuations in membranes with crystalline and hexatic 

order. J. Phys. 48, 1085-1092 (1987). 

1. Aronovitz, J. A. & Lubensky, T. C. Fluctuations of solid membranes. Phys. Rev. Lett. 

60, 2634-2637 (1988). 

2. Le Doussal, P. & Radzihovsky, L. Self-consistent theory of polymerized 

membranes. Phys. Rev. Lett 69, 1209-1212 (1992). 

3. Los, J. H., Katsnelson, M. I., Yazyev, O. V., Zakharchenko, K. V. & Fasolino, A. Scaling 

properties of flexible membranes from atomistic simulations: application to 

graphene. Phys. Rev. B 80, 121405 (2009). 

4. Li, X. et al. Large-area synthesis of high-quality and uniform graphene films on 

copper foils. Science 324, 1312-1314 (2009). 


LETTER 


15. Velasco, S. On the Brownian motion of a harmonically bound particle and the 
theory of a Wiener process. Eur. J. Phys. 6, 259-265 (1985). 

16. te Velthuis, A. J. W., Kerssemakers, J. W. J., Lipfert, J. & Dekker, N. H. Quantitative 
guidelines for force calibration through spectral analysis of magnetic tweezers 
data. Biophys. J. 99, 1292-1302 (2010). 

17. Fasolino, A, Los, J. H. & Katsnelson, M. I. Intrinsic ripples in graphene. Nature 
Mater. 6, 858-861 (2007). 

18. Nicklow, R., Wakabayashi, N. & Smith, H. G. Lattice dynamics of pyrolytic graphite. 
Phys. Rev. B 5, 4951-4962 (1972). 

19. Roldan, R., Fasolino, A. Zakharchenko, K. V. & Katsnelson, M. |. Suppression of 
anharmonicities in crystalline membranes by external strain. Phys. Rev. B 83, 
174104 (2011). 

20. Braghin, F. L. & Hasselmann, N. Thermal fluctuations of free-standing graphene. 
Phys. Rev. B 82, 035407 (2010). 

21. Georgiou, T. et al. Graphene bubbles with controllable curvature. Appl. Phys. Lett 
99, 093103 (2011). 

22. Wang, W. L. et al. Direct imaging of atomic-scale ripples in few-layer graphene. 
Nano Lett. 12, 2278-2282 (2012). 

23. Kosmrlj, A. & Nelson, D. R. Thermal excitations of warped membranes. Phys. Rev. E 
89, 022126 (2014). 

24. Chen, D., Tang, L. & Li, J. Graphene-based materials in electrochemistry. Chem. 
Soc. Rev. 39, 3157-3180 (2010). 

25. Rogers, J.A., Someya, T. & Huang, Y. Materials and mechanics for stretchable 
electronics. Science 327, 1603-1607 (2010). 

26. Zhu, S.-E. etal. Graphene-based bimorph microactuators. Nano Lett. 11, 977-981 
(2011). 

27. Yuk, J. M. et al. Graphene veils and sandwiches. Nano Lett. 11, 3290-3294 
(2011). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank D. Nelson, M. Bowick, A. Kosmrlj, J. Alden, A. van der 


Zande, and R. Martin-We' 
electrolyte gating. We tha 
reconstruction theory and 
developing and supportin 


Is for discussions. We thank 
nk R. Hovden for discussions 
techniques, and R. Hovden, 

g the TomViz three-dimensio 


E. Minot for assistance with 


on three-dimensional 
. Hanwell, and U. Ayachit for 
nal visualization software. We 


thank J. Wardini, P. Ong, A. Zaretski, and S. P. Wang for additional graphene samples, 
and F. Parish with Cornell’s College of Architecture, Art, and Planning for assistance 
with the paper models. We also acknowledge the Origami Resource Center (http:// 
www.origami-resource-center.com/) for kirigami design ideas. This work was 
supported by the Cornell Center for Materials Research (National Science 
Foundation, NSF, grant DMR-1120296), the Office of Naval Research 
(N00014-13-1-0749), and the Kavli Institute at Cornell for Nanoscale Science. 
Devices were fabricated at the Cornell Nanoscale Science and Technology Facility, a 
member of the National Nanotechnology Infrastructure Network, which is supported 
by the NSF (ECCS-0335765). K.L.M. and P.Y.H. acknowledge support from the NSF 
Graduate Research Fellowship Program (DGE-1144153 and DGE-0707428). 
Tomography visualization software development was supported by a SBIR grant 
(DE-SC0011385). 


Author Contributions Device design and actuation techniques were developed by 
M.K.B., A.W.B., P.A.R., S.P.R., and K.L.M. under the supervision of P.L.M. Fabrication and 
characterization was performed by the above authors with additional support from 
AR.R., J.W.K., and B.K. Bending stiffness measurements were designed by A.W.B. and 
P.L.M. and carried out by M.K.B., P.A.R., and K.L.M. with data analysis by S.P.R., A.W.B., 
M.K.B., K.L.M., and P.A.R. under the supervision of P.L.M. Electrical measurements were 
performed by K.L.M. and M.K.B. under the supervision of P.L.M. Three-dimensional 
reconstructions were performed by P.Y.H. under the supervision of D.A.M. The paper 
was written by M.K.B. and P.L.M., with assistance from P.A.R., A.W.B., K.L.M., and P.Y.H. 
and in consultation with all authors. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of the paper. Correspondence 
and requests for materials should be addressed to P.L.M. (plm23@cornell.edu). 


13 AUGUST 2015 | VOL 524 | NATURE | 207 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


METHODS 


Graphene growth. We grow graphene on copper following a standard chemical 
vapour deposition process". The copper foil is purchased from Alpha Aesar, stock 
number 13382. The copper is annealed for 36 min at 980 °C with a H) flow of 60 
standard cubic centimetres per minute (s.c.c.m.). Graphene is grown at 980 °C for 
20 min with a H, flow of 60 s.c.c.m. and a CH, flow of 36 s.c.c.m. The foil is then 
cooled in a matching environment as quickly as possible. 

Graphene characterization. Typical Raman spectra, scanning electron micro- 
scope images, and bright-field transmission electron microscope (TEM) images 
all confirm that the growths yielded mostly single-layer graphene with small 
bilayer regions (Extended Data Fig. 1). Dark-field TEM on a variety of growths 
reveal that typical grain sizes are of the order of hundreds of nanometres to 
micrometres. 

Fabrication of cantilevers and kirigami devices. Fabrication follows standard 
graphene processing methods, with the addition ofan aluminium release layer. We 
evaporate 40nm of aluminium on 170-11m double-side-polished fused silica 
wafers from Mark Optics. We dice the wafers into 2 cm X 2 cm chips, and transfer 
graphene to the chips using 2% poly(methyl methacrylate) (PMMA). We then 
etch the copper in ferric chloride (Transene, CE-200) for one hour and rinse with 
five consecutive deionized water baths. We transfer the graphene onto the alu- 
minium-coated chip, and soak overnight in acetone to remove the PMMA. Next, 
we use photolithography to pattern the pads and evaporate 50 nm of gold. We 
pattern the graphene strips and etch away the unwanted graphene with a 25-s 
oxygen plasma. Finally, we soak the chip in a mild (10:1) deionized water/HCl 
solution until the aluminium release layer has completely disappeared. The chip is 
transferred directly to a deionized water bath, which is kept refrigerated between 
uses to discourage bacterial growth. 

Atomic force microscope characterization. Atomic force microscope measure- 
ments on aluminium-free chips that are run in parallel with measured devices 
usually give step heights of 1-3 nm above that of pristine exfoliated graphene 
(Extended Data Fig. 2). Although it is impossible to completely avoid polymer 
residues from standard transfer and fabrication processing, a 2-nm layer of the 
stiffest PMMA (Young’s modulus Y = 3.3 GPa, Poisson ratio o = 0.4) should add 
only about 20 eV to the stiffness (since x = Yt*/[12(1—a)]), which is negligible 
compared to the measured values. 

Influence of surfactants. The addition of a surfactant reduces the graphene’s 
adhesion to the surface and prevents the graphene from permanently sticking to 
itself. We performed bending stiffness measurements with and without surfactant, 
and found that the presence of surfactant does not measurably affect the bending 
stiffness. A surfactant was used in all kirigami experiments. We used sodium 
dodecylbenzenesulfonate (SDBS) from Sigma-Aldrich (product number 
289957), dissolved in deionized water to a concentration of approximately 
3 mM. During the measurement process some water evaporates and is replaced 
with deionized water, so that the concentration of SDBS remains approximately 
constant over time. 

Extraction of (x2,) from thermal motion. To extract (x,) from the thermal 
motion of the gold pad on the free end of the graphene cantilever, we recorded the 
motion at 90 frames per second for about 20 min to ensure that the entire phase 
space of the cantilever motion was sampled. The first 20 s from the trace of the free 
gold pad on a 40 ttm X 10 ym cantilever are shown in Extended Data Fig. 3a. We 
tracked the motion of the pad centroid frame by frame using image analysis to 
extract the x position of the pad over time; the x direction is perpendicular to the 
profile of the free gold pad (see inset of Extended Data Fig. 3a). To extract (x4,) 
from this thermal motion, we calculated the power spectral density (PSD), which is 
the Fourier transform of the autocorrelation of the data, shown in Extended Data 


Fig. 3b. In all devices we observed low-frequency 1/f noise from the long-timescale 
motion of the supporting probe (shown in red). This low-frequency noise was 
excluded from further analysis. We fit the data plotted in blue, which resulted from 
the thermal motion of the free gold pad, with the theoretical one-sided PSD for 
Brownian thermal motion'®"° (dashed line): S..(f) = So/[1 + (f/f-)”], where So is 
the low-frequency value of the Brownian motion PSD, and f, is the corner fre- 
quency. The integral of this fitted function yields'*"° (x3,): [o° Sxx(f) df = (x4,). 
Using k=kgT/(x3,), where (x3,) = (130 nm)’ for the device shown in Extended 
Data Fig. 3, we find that the spring constant for this 40 1m X 10 um cantilever is 
k=2.4X10-7Nm ‘and that the bending stiffness x = 3 keV. 

Interferometric measurements. The wavelength of the laser that was used for the 
interferometric measurements is 436 nm; in water this means that the separation 
of the black and white bands is 436/4/1.33 = 82 nm. We used a 10-nm full-width- 
half-maximum bandpass filter with a centre wavelength of 430 nm on the 436-nm 
line of a mercury arc lamp. The reflectivity of the glass-water-graphene—water 
cavity creates a situation where the reflectivity changes from 0.0026 to 0.0067 
between dark and light bands (based on thin-film equations). A single sheet of 
graphene in water has a reflectivity of 0.0002, so our geometry greatly enhances the 
visibility of graphene. 

Electrical measurements of stretchable transistors. Electrical measurements 
were conducted in an approximately 10 mM KCI solution, with a few drops of 
about 3mM SDBS solution added. Since water evaporates during the measure- 
ment process, we periodically add deionized water to keep the concentration of 
SDBS approximately constant. The solution was gated by a gold wire, and the 
gate-drain current was minimized by contacting the drain electrode with a par- 
ylene-C-coated tungsten probe. An Ithaco 1211 current preamplifier was used to 
measure the current, and the system had a negligible gate-drain leakage current of 
about 10nA. The device geometry in Fig. 3d is equivalent to approximately 40 
squares in series. 

Laser force actuation and calibration. Force-displacement curves for cantilevers 
and a pyramid were measured using the radiation pressure of a 1,064-nm laser. 
The spring constant was determined by finding the slope of a linear fit to the data. 
The power of the laser was adjusted using an acousto-optic modulator. The force 
delivered to a gold pad was calibrated experimentally using the known weight of 
the gold. For a given power, the laser was focused near the centre of the gold pad, 
and the change in displacement was measured using a piezo attached to the 
objective of the optical microscope. 

Three-dimensional reconstruction of kirigami devices. We reconstructed the 
three-dimensional shape of graphene kirigami devices from a z-scanned focal 
series (Supplementary Video 2). We first acquire a series of images, varying the 
position of the lens to scan through a z depth of 100 jm in 10-nm steps. The 
images are background-subtracted to remove fixed-pattern features on the lens 
and camera. We then find the focal plane of each x-y pixel by using a refined 
minimum-intensity algorithm to look for the z plane with the highest contrast. 
Because the graphene is much thinner than the depth of focus, we assume it 
behaves like a point object in the z direction. In this model, we use Gaussian fits 
to find the z centre of the dip in intensity for each x-y pixel. For graphene near the 
gold pads, we restrict the fit positions to exclude shadowing effects from the pads 
as they go out of focus. The resulting matrix of z positions is cropped to the size of 
our object, converted to a three-dimensional matrix of intensities, and smoothed 
with a three-dimensional Gaussian blur of about 400 nm to reduce noise. Finally, 
we use TomViz (a development of Paraview that is optimized for tomography 
visualizations) to render the three-dimensional object. The colour map is based on 
the intensity of the original video. 
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Extended Data Figure 1 | Characterization of representative graphene. 

a, Scanning electron microscope image of chemical vapour deposition 
graphene on copper foil. All the graphene used in these experiments was 
predominantly single-layer, with some small bilayer patches. The larger-scale 
contrast variations show the copper grains. Scale bar is 10 jm. b, Raman 
spectrum of chemical vapour deposition graphene transferred to SiO2/Si 
(285-nm oxide layer) substrate. The spectrum shows graphene’s characteristic 
G peak at 1,580 cm’! and two-dimensional peak at 2,700cm_'; the ratio 


between the two indicates that the graphene is primarily monolayer. A small D 
peak at 1,350 cm! indicates low disorder. The small peak at 2,450 cm! is 
background. ¢, High-contrast bright-field TEM image of graphene transferred 
over 10-nm-thick Si;N4 windows shows continuous monolayer graphene. 
Scale bar is 1 um. d, False-colour composite image of dark-field TEM images 
showing grain size and shape. The graphene is polycrystalline, with grain 
sizes of the order of micrometres. Scale bar is 1 jm. e, TEM diffraction pattern 
for region shown in c and d. 
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Extended Data Figure 2 | Atomic force microscopy of graphene. Chips that look clean under the optical microscope typically have 2-4 nm total 


a, Exfoliated, unprocessed monolayer graphene. Step height along the redline _ step heights. We occasionally see higher residue lines at the edges of the 
shown on the height (z) map is 1.0 + 0.3 nm. b, c, Representative data from graphene, as in c. The PMMA residue is not sufficiently thick to explain the 
aluminium-free chips that are run in parallel with the devices used in bending __ notably increased bending stiffness of the graphene (see Methods). Scale bars 
stiffness measurements. Step heights are 2.5 + 0.4nm and 2.4 + 0.5nm. are 1 um. 
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Extended Data Figure 3 | Thermal motion of graphene cantilever gold pads. _ data points from about 10 *Hz to about 10! Hz were excluded, because they 
a, Time trace of the x position of the gold pad centroid on a 40 um X 10 um show considerable 1/f noise as a result of the motion of the probe holding 
graphene cantilever, showing the first 20 s of a 20-min trace. The x direction _ the cantilever. We integrate the fitted function S,,, to determine (xj,). For the 
is perpendicular to the profile of the gold pad, as indicated in the inset. device shown, (x3,) = (130 nm)’, the spring constant k= 2.410 “Nm |, 
b, PSD (S,..) of the full 20-min time trace from a. The blue data points were and the bending stiffness x = 3 keV. 


included in the fit to the Brownian motion PSD function (dashed line); the red 
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Extended Data Figure 4 | Bending stiffness measurements. a, We also 
performed a rough measurement of the spring constant using the force of 
gravity on the gold pads. After lifting the cantilever off the surface, the vertical 
deflection x, is determined using the shallow depth of focus of the microscope, 
adjusted for the change in index of refraction. The gravitational force F, 
(corrected for buoyancy) yields the spring constant: k = F,/x,. For the 
50-um-long cantilever shown (scale bar is 10 um) with a 2-pN gold pad and 
X, = 25 um, we find that k= 8X10 ®Nm 1. We repeated the measurement 
for a variety of devices of varying length L and width W = 10 um. We have 


100 
L (um) 


observed that the cantilevers sometimes curve downwards even in the absence 
of an applied force, presumably due to residual materials or strains in the 
graphene, so these gravitational measurements probably have a systematic 
offset. b, The measured spring constants of 10-um-wide devices are shown ona 
plot of spring constant versus device length for the thermal fluctuation (black), 
gravitational deflection (blue), and laser force measurements (red). The data 
from all three techniques (plus additional laser data for devices with other 
widths, as in Fig. 2) are shown in the inset as a histogram. Data with stars are 
from the same device, using the three different measurement techniques. 
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Dosage delivery of sensitive reagents enables 


glove-box-free synthesis 


Aaron C. Sather'!, Hong Geun Lee!, James R. Colombe, Anni Zhang" & Stephen L. Buchwald! 


Contemporary organic chemists employ a broad range of catalytic 
and stoichiometric methods to construct molecules for applica- 
tions in the material sciences', and as pharmaceuticals”*, agro- 
chemicals, and sensors®. The utility of a synthetic method may be 
greatly reduced if it relies on a glove box to enable the use of air- 
and moisture-sensitive reagents or catalysts. Furthermore, many 
synthetic chemistry laboratories have numerous containers of 
partially used reagents that have been spoiled by exposure to the 
ambient atmosphere. This is exceptionally wasteful from both an 
environmental and a cost perspective. Here we report an encap- 
sulation method for stabilizing and storing air- and moisture- 
sensitive compounds. We demonstrate this approach in three 
contexts, by describing single-use capsules that contain all of the 
reagents (catalysts, ligands, and bases) necessary for the glove-box- 
free palladium-catalysed carbon-fluorine’’, carbon-nitrogen”", 
and carbon-carbon” bond-forming reactions. This strategy 
should reduce the number of error-prone, tedious and time- 
consuming weighing procedures required for such syntheses and 
should be applicable to a wide range of reagents, catalysts, and 
substrate combinations. 

We sought to develop a system to allow for the bench-top storage of 
pre-measured quantities of air- and moisture-sensitive reagents and 
catalysts in such a way that the contained material would be liberated 
into a reaction mixture upon subjection to typical reaction conditions. 
We initially chose paraffin wax as a stabilizing agent as it has been 
shown to be an effective material for protecting sensitive compounds 
from oxygen and water in the atmosphere’*"’*. For instance, a paraffin 
wax dispersion of normally pyrophoric potassium hydride can be 
easily handled and is relatively stable under ambient laboratory con- 
ditions’’. As such, preliminary work focused on creating dispersions of 
reagent and reagent mixtures using molten paraffin wax, although it 
was not possible to achieve a uniform distribution of the components 
using this method. Upon cooling, a gradient was established within the 
paraffin matrix, making it impossible to determine the concentration 
of the constituents for a given sample. Moreover, reagents located on 
the surface are exposed to the atmosphere, and free to react with air 
and water. To address these shortcomings, we developed a simple 
method to enclose premeasured amounts of catalysts and reagents 
within paraffin capsules, isolating them from the atmosphere. 
Hollow paraffin (melting point 58-62 °C) shells were manually pre- 
pared and filled with catalyst and reagent combinations, thus provid- 
ing a single stabilized entity with which to conveniently carry out a 
variety of transformations (Supplementary Figs 1-4). 

To probe the effectiveness of the encapsulation technology, we first 
studied the oxygen- and moisture-sensitive palladium-catalysed nuc- 
leophilic fluorination of aryl triflates (ArOTfs) (Fig. 1a)’*. Fluorinated 
aromatics are a common motif found in pharmaceuticals and 
agrochemicals, and are introduced to impart metabolic stability and 
enhanced lipophilicity'*. The introduction of a fluorine atom can also 
increase protein-binding affinity’? and affect the orientation and con- 
formation of a molecule when binding to a protein’. As a result, the 
synthesis of fluorinated compounds has generated great interest”. 


Traditional methods’ of incorporating a fluorine atom onto an 
aromatic ring typically require harsh conditions, which limits the 
scope of these transformations and necessitates the introduction of 
fluorine early in the synthesis. In contrast, palladium catalysis allows 
for the late-stage transformation of ArOTfand aryl bromides (ArBr) to 
the corresponding aryl fluoride (Ar-F), providing good yields and 
exhibiting a much broader substrate scope. In addition to the well- 
documented challenges associated with this transformation”, which 
includes a difficult reductive elimination (RE) step, care must be taken 
to exclude water to prevent proto-demetallation (ArH) and formation 
of phenol (ArOH) and biaryl ether (Ar.O) side products. The metal 
fluoride salts (caesium fluoride (CsF) and silver(I) fluoride (AgF)) used 
in these reactions are hygroscopic, and the Pd(0) precatalyst is sens- 
itive towards oxygen®, which requires the reaction to be set up in a 
glove box. 

To address problems arising from stability, the hollow paraffin 
shells were charged with 2 mol% P1 (4mol% of Pd) using L1 as the 
supporting ligand and 3 mmol of CsF (Fig. 1b, blue capsule), and 
stored on the bench top. With the capsules in hand, the reaction set- 
up is inherently simple. The desired ArOTf (1 mmol) is added to an 
oven-dried reaction tube equipped with a stir bar, followed by a cap- 
sule. After evacuating the tube and backfilling with argon, solvent is 
added. Upon heating to the specified temperature, the capsule melts 
and releases its contents, initiating the transformation. When the reac- 
tion is complete, the paraffin is easily removed by precipitation, filtra- 
tion, and silica gel chromatography. 

With this method, a variety of aryl (1-4) and heteroaryl (5, 6) fluor- 
ides could be prepared in yields that are comparable to those obtained 
with the aid of a glove box (Fig. 1c). While some examples were prev- 
iously reported using lower catalyst loadings (2-3 mol% Pd), 2 mol% 
P1 (4 mol% Pd) was loaded into each capsule to provide a universal 
reagent capable of transforming all desired ArOTfs—facilitating opera- 
tional simplicity. To demonstrate the robustness of this technology, 
a capsule was suspended in a beaker of water for 24h, dried with a 
paper towel, and used in a reaction to provide the Ar-F in undimin- 
ished yield (Fig. 1c, 3). However, a capsule that was kept on the bench 
top at room temperature for over eight months showed decreased 
activity and required elevated reaction temperatures to achieve full 
conversion of the starting material (Supplementary Table 1). 

With this initial success, we applied the capsule method to the 
palladium-catalysed nucleophilic fluorination of aryl bromides 
(ArBr) (Fig. 2). As previously described’, two fluoride salts are required 
for this transformation (KF and AgF), as well as a palladium(0) 
precatalyst with either L2 or L1 as the supporting ligand. Because 
the L2-supported precatalyst (P2) is effective for the fluorination of 
both aryl and heteroaryl bromides, it was selected as the optimal cata- 
lyst for use with the wax capsules. As in the preceding example, the 
hollow paraffin shells were charged with both P2 and the reagents 
necessary to transform 1 mmol of ArBr to the desired Ar-F (Fig. 2a, 
red capsule). 

These three-component capsules were able to provide a range of 
aryl (7-9) and heteroaryl (10-12) fluorides from commercially 
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Figure 1 | Wax capsules for the glove-box-free Pd-catalysed nucleophilic 
fluorination of aryl triflates. a, The catalytic cycle of a typical palladium- 


catalysed cross-coupling reaction; sensitive aspects are highlighted for clarity. 


Lis a ligand, OA is oxidative addition, TM is transmetallation, and RE is 
reductive elimination. M is either a counter cation or a proton. COD is 
1,5-cyclooctadiene. b, Contents of the wax capsule for the fluorination of 
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Figure 2 | Wax capsules for the glove-box-free Pd-catalysed nucleophilic 
fluorination of aryl bromides. Ad is 1-adamantyl. THF, tetrahydrofuran. 
TBME, tert-butyl methyl ether. The green F (fluorine) highlights the site of 
the transformation. a, Contents of the wax capsules for the fluorination of 
aryl bromides. b, Glove-box-free fluorination of aryl bromides. Isolated 
yields are reported as an average of two runs. *Isolated yields that 
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ArOTf. The green F (fluorine) highlights the site of the transformation. 

c, Glove-box-free fluorination of ArOTf. Isolated yields are reported as an 
average of two runs. *Isolated yields that were previously reported and obtained 
using a glove box to set up the reactions. fIsolated yield after soaking a capsule 
in water for 24h. 


available ArBr in good yields, which rival those that were obtained 
when the reactions were set up in a glove box (Fig. 2b). Again, to 
test the capsules’ robustness, a capsule was placed in a beaker of water 
for 24h. Once dried, the activity of this capsule matched that of a 
capsule that never made direct contact with water (Supplementary 
Table 3). 

To highlight the generality of this approach, we applied the 
paraffin capsule technology to other reaction types that are useful in 
a variety of research areas. The first method we pursued was the 
palladium-catalysed cross-coupling of aryl halides with amine 
nucleophiles”*, which has become an indispensable tool for applica- 
tions in materials science’, sensor synthesis®, and pharmaceutical 
development’ >. Over the years, our laboratory has developed a series 
of biaryl monophosphine ligands and highly efficient base-activated, 
ligated Pd(II) precatalysts for C-N bond formation that are commer- 
cially available *°. 

Although the components of this reaction are not sensitive to oxy- 
gen, the base required is hygroscopic, and must be kept in a glove box 
or stored in a desiccator. Additionally, it was discovered that a dual 
ligand mixture composed of L3 and L4 yielded a system capable of 
coupling both primary and secondary amine nucleophiles’. Thus, a 
paraffin capsule containing L3-based precatalyst (P3), L4, and sodium 
tert-butoxide (base, Fig. 1a) would be capable of coupling a breadth of 
primary and secondary amines by the addition of a single universal 
encapsulated reagent (Fig. 3a, orange capsule), eliminating the need 
for time-consuming reaction optimizations. Indeed, these capsules 
coupled a primary alkyl amine (13), an acyclic secondary amine 
(14), a cyclic secondary amine (15), anilines (16 and 17), anda primary 
hetero-aromatic amine (18) to aryl halides and heteroaryl chlorides 
(Fig. 3b). The capsules were stored on the bench top and showed no 
signs of degradation over a period of over eight months, even though 
the base-activated P3 was stored in close contact with sodium 
tert-butoxide (Fig. 3b, 18). 

The palladium-catalysed Negishi cross-coupling of 2-pyridylzinc 
dioxanate with aryl halides and triflates was also adapted for use with 
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Figure 3 | Wax capsules for the Pd-catalysed CN cross-coupling of 1° and 2° 
amines with aryl halides. Ms, methanesulfonate. a, Contents of the wax 
capsule for the amination of aryl halides. b, Examples of C-N coupling using 
the wax capsules. The purple portion of the molecule indicates the amine 
coupling partner, highlighting the site of the transformation. Isolated yields are 
reported as an average of two runs. *Previously reported isolated yields. 
+Toluene was used as the reaction solvent. {Isolated yield after storing a capsule 
on the bench top for over eight months. 


paraffin wax capsules’*. The 2-pyridyl group has found applications in 
functional materials” and is a component of biologically active com- 
pounds”. Traditional 2-pyridyl nucleophiles such as boronates suffer 
from instability”, which makes them difficult to employ in Suzuki- 
Miyaura cross-coupling reactions. In contrast, the dioxane-stabilized 
2-pyridylzinc reagent is a solid, competent nucleophile that can be 
briefly manipulated in air, although prolonged storage is problematic 
owing to its sensitivity to water'’. 

Encapsulation of the basic 2-pyridylzinc dioxanate (MNu, Fig. la) 
with base-activated palladium precatalyst (P4) within a paraffin wax 
capsule provides a bench-top-stable reagent and an efficient means of 
introducing this important functional group to a variety of (hetero)- 
aryl halides and triflates (Fig. 4a, purple capsule). With this techno- 
logy, (hetero)aryl chlorides, (19 and 20), aryl triflate (21), and 
(hetero)aryl bromides (22, 23, and 24) were easily converted to 
the desired 2-pyridyl compounds. To demonstrate the stability of 
the zinc reagent, capsules containing 2-pyridylzinc dioxanate that 
have been stored on the bench top for one year were shown by 
titration to contain the original amount of active material (Supple- 
mentary Table 5). 

We have reported that several valuable oxygen- and water-sensitive 
cross-coupling catalysts and reagents can be stabilized by encapsula- 
tion within inert, hydrophobic wax capsules. These capsules provide 
access to an array of desirable cross-coupled products by the conveni- 
ent addition of a single, user-friendly, bench-top-stable reagent. 
Through collaboration with chemical providers, the manual capsule 
preparation process should be easy to mechanize for large-scale pro- 
duction, making this technology widely available for a variety of tra- 
ditionally sensitive compounds”. Furthermore, we envision that this 
concept will transform other moisture- and air-sensitive reagents 
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Figure 4 | Wax capsules for the Pd-catalysed Negishi cross-coupling of 
2-pyridylzinc dioxanate. Diox, dioxane. Boc, tert-butoxycarbonyl. The orange 
portion of the molecule indicates the pyridine coupling partner, highlighting 
the site of the transformation. a, Contents of the wax capsule for the Negishi 
cross-coupling of 2-pyridylzinc. b, Examples of Negishi cross-coupling using 
the wax capsules. Isolated yields are reported as an average of two runs. *As 
previously reported, an additional 4 mol% L5 and 2 mol% P4 were added. 
+Previously reported isolated yields. 


(such as ZnCl, AICl;, AgF>) by turning reactions that employ these 
into operationally simpler and more robust processes. 
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longevity of flat slabs 


Sanja Knezevic Antonijevic', Lara S. Wagner”, Abhash Kumar", Susan L. Beck®, Maureen D. Long*, George Zandt’, 


Hernando Tavera” & Cristobal Condori? 


Flat-slab subduction occurs when the descending plate becomes 
horizontal at some depth before resuming its descent into the 
mantle. It is often proposed as a mechanism for the uplifting of 
deep crustal rocks (‘thick-skinned’ deformation) far from plate 
boundaries, and for causing unusual patterns of volcanism, as 
far back as the Proterozoic eon’. For example, the formation of 
the expansive Rocky Mountains and the subsequent voluminous 
volcanism across much of the western USA has been attributed to a 
broad region of flat-slab subduction beneath North America that 
occurred during the Laramide orogeny (80-55 million years ago)’. 
Here we study the largest modern flat slab, located in Peru, to 
better understand the processes controlling the formation and 
extent of flat slabs. We present new data that indicate that the 
subducting Nazca Ridge is necessary for the development and 
continued support of the horizontal plate at a depth of about 
90 kilometres. By combining constraints from Rayleigh wave phase 
velocities with improved earthquake locations, we find that the flat 
slab is shallowest along the ridge, while to the northwest of the 
ridge, the slab is sagging, tearing, and re-initiating normal subduc- 
tion. On the basis of our observations, we propose a conceptual 
model for the temporal evolution of the Peruvian flat slab in which 
the flat slab forms because of the combined effects of trench retreat 
along the Peruvian plate boundary, suction, and ridge subduction. 
We find that while the ridge is necessary but not sufficient for the 
formation of the flat slab, its removal is sufficient for the flat slab 
to fail. This provides new constraints on our understanding of 
the processes controlling the beginning and end of the Laramide 
orogeny and other putative episodes of flat-slab subduction. 

Oceanic plates subduct at different angles ranging from steep to 
shallow, with flat slabs representing the horizontal endmember. The 
subduction of buoyant aseismic ridges and plateaus comprising over- 
thickened oceanic crust has long been thought to play a part in the 
formation of flat slabs*. More recent work has identified other poten- 
tial contributing factors, including trench retreat**, rapid overriding 
plate motion**, and suction between the flat slab and overriding con- 
tinental mantle lithosphere*. Many of these studies do not preclude the 
need for additional buoyancy from overthickened oceanic crust. 
However, a few recent studies suggest that subducting ridges do not 
affect the formation or sustainability of flat slabs®”. 

To evaluate the influence of subducting ridges on the evolution of 
flat slabs, we focus on the flat slab in southern Peru (Fig. 1). Here, the 
subducting Nazca Ridge trends at an oblique angle to relative plate 
motion, resulting in a northward migration of the overriding continent 
relative to the down-going ridge*. We have collected and analysed data 
from two deployments of broadband seismometers in central and 
southern Peru: PULSE (Peru Lithosphere and Slab Experiment)”, 
and CAUGHT (Central Andean Uplift and Geodynamics of High 
Topography)’®. We also incorporate data from eight stations from 
the PERUSE deployment (Peru Slab Experiment)" and the permanent 


station called NNA in Lima, Peru (Fig. 1). Here we present a three- 
dimensional model of shear-wave velocity structure between —10° 
and —18°, obtained from the inversion of earthquake-generated 
Rayleigh-wave-phase velocities (Fig. 2 and Extended Data Figs 2- 
10). We also relocate slab seismicity across our study area using a 
double difference methodology (Figs 2 and 3, Extended Data Fig. 1 
and Supplementary Table 1) (see Methods for details). 

Our tomographic images and improved earthquake locations show 
the flat slab to be shallowest along the present-day projected location 
of the subducted Nazca Ridge (Figs 2g and 3g, h). To the south 
(Fig. 2h), the slab transitions abruptly from flat to normal, and earth- 
quake locations align with an increase in shear-wave velocity in our 
model. To the north, where previous studies have proposed a broad flat 
slab of relatively uniform depth’””’, we see a gradual but marked 
deepening of the plane of seismicity associated with subducted slab 
known as the Wadati—Benioff zone (Figs 2e, f and 3g, h). To the east, 
high shear-wave velocities associated with the flat slab extend substan- 
tially inboard (that is, away from the trench, inland) than the seismic- 
ally active portion of the plate (Figs 2g and 3g,h). The downward bend 
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Figure 1 | Reference map of the Peruvian flat-slab region, illustrating the 
subducting Nazca Ridge beneath the advancing South American plate. 
Diamonds represent seismic stations used in this study: orange, PULSE; dark 
red, CAUGHT; yellow, PERUSE; red, the permanent NNA station. Yellow 
stars represent the cities Lima and Cusco. Red triangles represent volcanoes 
active during the Holocene epoch. The black arrow indicates the relative 
motion of the South American plate with respect to the Nazca Plate’’. 
Dotted white lines show the estimated position of the Nazca Ridge 

12-10 Ma and today’. 
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Figure 2 | Three-dimensional model of the structure of shear-wave 
velocities between —10° and —18°. a-c, Shear-wave velocities and seismicity 
at depths of 75 km (a), 105 km (b) and 145 km (c), and transects along the 
northern reinitiating steep slab (A-A’, B-B’), flat slab (C-C’) and southern 
steep slab (D-D’) segments. Colours indicate velocity deviations, dV,/V, (%); 
contours show absolute velocities in kilometres per second (numbered). 

a-c, Black circles represent stations used in our study; red triangles are 
Holocene volcanoes; green stars are earthquakes within 20 km of the depth 
shown; black lines refer to cross-sections shown in e-h. The grey 

dashed line in b and c shows the location of the trench 10 Ma (ref. 8); 


in the high-velocity plate at the easternmost extent of the flat slab 
appears to coincide with the location of the Peruvian trench about 
10 million years ago (Ma)’. 

Of particular note is the geometry of the subducted plate north of 
the projected Nazca Ridge track (Fig. 2a-c, e, f). Here, we observe a 
dipping high-velocity anomaly to the trenchward side of a dipping 
low-velocity anomaly. We note the similarity between these structures 
(in an area previously believed to comprise typical flat slab) and those 
observed to the south beneath the active arc (Fig. 2e, f, h). We also note 
the difference between these structures and those adjacent to the ridge, 
where the continuous flat slab is well resolved (Fig. 2e-g and Extended 
Data Figs 3, 6-10). We interpret the westward-dipping low-velocity 
region parallel to the trench to be evidence of asthenosphere (the 
viscous, weak region of the upper mantle) between two torn portions 
of subducted plate. The dipping high-velocity anomaly to the west 
indicates the presence of a normally dipping slab extending to a depth 
of at least 200 km. This is consistent with the location of shear-wave 
scatterers identified from converted phases in earlier studies'*. We 
propose that the subhorizontal seismicity to the east of the tear is 
located in remnant flat slab that has not yet been fully subducted. 
Local shear wave splitting studies show that shear waves move faster 
parallel to the trench’, consistent with north-south-directed astheno- 
spheric flow through a break in the Nazca plate. We also note the 


(wy) ydeq 


the black dashed line (labelled “T’) indicates the location of the slab tear. ‘R’ 
refers to the resumption of steep subduction at the eastern edge of the flat slab. 
d, Inferred flat-slab geometry along the Nazca Ridge track, and slab tear north 
of the ridge. e-h, Cross-sections of slab segments shown in a-c. Black dots 
show earthquake locations from this study; black inverted triangles are 
stations; red triangles are Holocene volcanoes; orange triangle represents 

the location of a measurement of unusually high heat flow'®. Dashed lines 
show the inferred top of the slab. The thick black line shows the crustal 
thickness. 


presence of a localized high heat flow (196 mW s-*) above this low- 
velocity anomaly’® (Fig. 2e). Along the northernmost transect, the 
location of the slab is not well resolved above a depth of about 
100 km (Fig. 2e). Future work using ambient noise tomography may 
help us to resolve the slab geometry here, by providing improved 
constraints on velocities at shallower depths. 

We incorporate the results of previous geodynamic modelling 
studies with our results, to create a conceptual model of the temporal 
evolution of the Peruvian flat slab (Fig. 3). We begin with the initiation 
of ridge subduction at approximately 11.2 Ma (ref. 8), before which we 
assume normal subduction across our study area (Fig. 3a). From there, 
we base our proposed temporal evolution of the Peruvian flat slab on 
four principles. 

First, we present our conceptual model from the reference frame of a 
laterally stationary Nazca plate. Second, while most of the Nazca plate 
sinks vertically at a relatively constant rate, the plate containing the 
Nazca Ridge ceases to sink at a depth of about 90km (Fig. 3b-f). 
We propose that this is due to buoyancy imparted by the overthick- 
ened oceanic crust and harzburgite layer associated with the ridge, 
consistent with previous modelling studies'®. Third, we observe that 
the modern inboard extent of the Peruvian flat slab corresponds to the 
location of the trench at about 10 Ma. Given that the projected location 
of the Nazca Ridge extends further to the east, this finding suggests that 
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Figure 3 | Proposed evolution of the Peruvian flat slab. a-f, Proposed 
contours of the subducted slab, assuming that the ridge remains buoyant for 
10 Ma after entering the trench. The approximate location of the subducted 
ridge is denoted by the black rectangular outline. Brown areas show areas of 
the continent underlain by flat slab at each time step. Triangles indicate 
volcanoes active during the 2 Myr following the time of the frame shown”. The 
location of the South American continent relative to the Nazca Ridge follows 
ref. 8. In a, we show the location of the projection of the mirror image of 

the Nazca Ridge (in yellow) that formed synchronously with the Nazca Ridge 
on the Pacific Plate when these plates were first created at the spreading centre 
following ref. 8. In e, red triangles show volcanism from 3 Ma to 2 Ma, and 
brown triangles show volcanism from 2 Ma to 1 Ma. In f, volcanism is shown 


some portion of the Nazca Ridge has resumed normal subduction. We 
propose that, over time, the kinetically slow conversion of basalt and 
gabbro to eclogite in the overthickened crust of the Nazca Ridge results 
in an increase in the density of the horizontal plate. Given the inboard 
extent of the modern flat slab, we propose that, approximately 10 Ma 
after entering the trench, the overthickened oceanic crust of the 
Nazca Ridge becomes sufficiently eclogitized that it is no longer 
neutrally buoyant and therefore resumes its vertical descent (Figs 2b, 
g and 3e-g). 

Finally, modelling studies indicate that suction between the hori- 
zontal plate and overriding continental lithosphere hinders the 
removal of the flat slab*. In our study area, this is important because 
the portion of the continent under which the flat slab initially forms 
moves northwest relative to the ridge over time. To test whether the 
flat slab will perpetuate beneath these continental regions after the 
departure of the ridge, we apply the fourth principle to our model: 
continental regions previously underlain by the flat slab will con- 
tinue to have flat slab beneath them for some time (brown regions in 
Fig. 3). This results in a broadening of the flat slab as new continental 
areas to the south become underlain by the ridge and associated flat 
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for 1 Ma to 0 Ma (not including Holocene volcanism). g, Modern seismicity 
from this study (large circles) with depths >50 km, and contours as they would 
be if the removal of the ridge did not affect the longevity of the flat slab. 

h, Modern seismicity from this study and local seismicity at depth >50 km, as 
reported in the ISC catalogue for years 2004-2014, shown as smaller circles’’. 
We plot our observed slab contours on the basis of our earthquake locations 
and the location of high-velocity anomalies in our tomographic results. Dashed 
lines indicate contours that are less certain, either because of a paucity of 
earthquakes or because they lie outside of our region of good tomographic 
resolution. The pink triangular shape shows the region with very limited 
seismicity that may indicate a slab window caused by tearing and the 
reinitiation of normal subduction. 


slab, while areas to the north that were previously underlain by the 
ridge maintain their flat-slab geometry. This is consistent with earl- 
ier studies that attribute the along-strike (trench parallel) extent of 
the Peruvian flat slab to the southward sweep of the Nazca Ridge 
over time®’. 

The proposed temporal evolution of the Peruvian flat slab shown 
in Fig. 3 combines the influences of trench retreat/overriding plate 
motion, suction, and ridge buoyancy. It assumes that the combina- 
tion of all three forces is necessary for the formation of the flat slab, 
but that the first two are sufficient to perpetuate the flat slab after the 
departure of the ridge. A comparison between our conceptual mod- 
el’s slab geometry at present (Fig. 3g) with actual (observed) slab 
geometry (Fig. 3h) allows us to test these assumptions. The abrupt 
edge of the flat slab that we observe south of the ridge is very similar 
to that proposed by our conceptual model. We note that the dom- 
inant principle controlling the geometry of the flat slab here is the 
effect of ridge buoyancy, as there is no difference in trench rollback 
or continental lithospheric structure that might affect suction along 
strike in this region. Our observations therefore support the neces- 
sary contribution of the ridge to the formation of flat slabs, but are 
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also not inconsistent with additional contributions from suction and 
trench rollback. 

Differences between the observed slab geometry and the geometry 
derived from our conceptual model are visible to the north of the ridge. 
In this area, the effect of the ridge is no longer present, and the geo- 
metry of the flat slab in our conceptual model is controlled by the 
effects of suction and trench rollback alone. Although both our con- 
ceptual model and our observations indicate a flat slab that broadens to 
the northwest of the ridge, the detailed morphologies are very different. 
In addition to an overall deepening of the flat slab north of the ridge 
(Fig. 3g, h and Extended Data Fig. 1), we observe a clear trench-parallel 
break in the subducted plate and a resumption of normal subduction 
trenchward of this tear (Figs 2a—c and 3h). This strongly suggests that, 
despite the presence of suction and trench rollback, the flat slab is no 
longer stable once the buoyant Nazca Ridge has been removed. 
Furthermore, once a break is present, the newly subducted plate 
assumes a normal steep dip angle, rather than a flat-slab geometry. 
In this study we are not able to resolve the northern extent of the 
Peruvian flat slab, nor can we establish the along-strike extent of the 
tear. However, International Seismological Centre Catalog locations 
north of our study area show a gap in seismicity that may be consistent 
with the absence of a flat slab because of a progressively tearing plate 
(Fig. 3h)’’. The northward extent of the flat slab east of the tear may be 
due in part to the subduction of the Inca Plateau’, although this is 
beyond the scope of our study. 

Our model is applicable to all flat-slab geometries in cases for which 
a distinct change of dip angle is observed. This change in dip occurs at 
the depth at which the slab becomes neutrally buoyant. Our results 
may not be applicable to slabs when the dip angle is constant but very 
shallow’ (for example, in Alaska and in the Cascadia region of the 
USA). Slabs that dip at a shallow angle sink at a constant rate, which 
is inconsistent with a period of neutral buoyancy. Such slabs may have 
effects that are similar to those produced by flat slabs, although they do 
not result in a complete cessation of arc volcanism (as occurred during 
the Laramide orogeny and is observed in Peru today), only its inboard 
deflection. 

Our results may provide insights into the final stages of flat-slab 
subduction. Previous studies used volcanic patterns to reconstruct the 
formation and foundering of the Farallon flat slab in the western 
USA?"*””, The diversity of models for the progression of this founder- 
ing is indicative of the insufficiency of the constraints provided by 
volcanic trends alone. Our results suggest that once the flat slab 
extends some distance away from the buoyant feature, it will begin 
to sink and/or tear. Tearing of the Farallon plate caused by an exces- 
sively wide flat slab may be consistent with tomographic images of 
broken fragments of the Farallon plate”’. 

We conclude that flat slabs form through a combination of trench 
retreat, suction, and the inability of overthickened oceanic crust to sink 
below some depth (about 90km) until sufficiently eclogitized to 
become negatively buoyant once again. Flat slabs that extend laterally 
beyond some critical distance from the buoyant overthickened crust 
will begin to founder, even in the presence of other factors such as 
suction and trench retreat. The Peruvian flat slab provides insights into 
the temporal evolution of flat slabs from initial shallowing to collapse, 
yielding new constraints for the reconstruction of flat-slab genesis and 
the nature of the flat-slab foundering. 
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METHODS 

Earthquake locations. We use ANTELOPE software to auto-detect earthquakes, 
using a short-term average (STA) versus long-term average (LTA) trigger mech- 
anism. The lengths of the STA and LTA moving time windows were chosen to be 
1s and 10s respectively. After manually inspecting the waveforms, we selected 977 
earthquakes out of 3,000 auto-detected events. We picked primary (P) and sec- 
ondary (S, shear) wave arrival times for 673 slab events using SEISAN*’. The 
selected events have the following characteristics: all events are in the depth range 
50-310 km; travel-time misfit is less than 1 s; data are well recorded at a minimum 
of ten stations with azimuthal gap =270° (see Supplementary Information). 

For relative locations we use the program HYPODD” (Extended Data Fig. 1). 
We calculate differential times between P and S phases recorded at a common 
station for each event pair separated by =40km. This interevent distance was 
interactively chosen after optimizing the linkage between the events in the first- 
step processing of phase data in HYPODD. Each event is strongly linked to a 
maximum of ten neighbouring events, having at least eight travel-time observa- 
tions. We used the P-wave velocity model of ref. 25 for our starting model, and set 
the crustal thickness to 65 km. We used a V,,/V, ratio of 1.75 to calculate S-wave 
velocities (where V,, is the P-wave velocity and V, is the S-wave velocity). 
Three-dimensional shear-wave imaging. The three-dimensional imaging of 
shear-wave velocity structure using earthquake-generated Rayleigh waves pro- 
ceeds in two steps: first, we invert for Rayleigh wave phase velocities; subsequently, 
we invert the obtained phase velocities for shear-wave velocities. We use the two- 
plane wave method” to invert for Rayleigh wave phase velocities. Observations are 
modelled as a sum of two interfering plane waves, each described by its amplitude, 
phase and backazimuth. Predicted phase and amplitude values are calculated 
using finite frequency sensitivity kernels” that incorporate the (Born) single scat- 
tering approximation”*. Amplitudes are corrected for geometrical spreading and 
attenuation. We examined 12 periods in the band between 0.007 Hz and 0.03 Hz, 
sensitive to V, structure from the lower crust (~40-km depth), to the upper mantle 
(~200-km depth). 

Data were collected from several seismic networks: PULSE’, CAUGHT”, 
PERUSE}, and the global network permanent station in Lima, Peru. We picked 
fundamental mode Rayleigh waves for 65 well recorded teleseismic events 
(Extended Data Fig. 2a) with magnitudes =5.5. 

We defined the study area with corners at 10° S, 69° W; 18° S, 79° W; 10° S, 69° 
W; and 10° S, 69° W (Extended Data Fig. 2b). The starting velocity model 
(Extended Data Fig. 2c) accounts for different crustal thicknesses across the study 
area’’. We combine the IASPEI91 velocity model for the mantle” and the model of 
ref. 31 for the crust and use a forward algorithm” to predict phase velocities across 
the study region. 

The inversion is regularized with model covariances set to 0.15kms '. The 
choice of regularization parameter is based on the stability of both Rayleigh-wave 
and shear-wave inversions. Longer periods are generally less well resolved than 
shorter periods because of their broader sensitivity kernels. The best resolved areas 
are beneath the Western Cordillera, Altiplano, Eastern Cordillera, and coastal 
forearc and, to a lesser extent, the Sub-Andean zone (Extended Data Fig. 3a). 
The resolution within the foreland basin is mostly confined along the stations 
deployed in foreland basin in eastern Peru. 

In the second step, we invert obtained phase velocities (Extended Data Fig. 4) for 
one-dimensional shear-wave velocities’. We use the same starting model as in 
the previous step (Extended Data Fig. 2c). Sensitivity kernels for longer periods are 
substantially broader than the sensitivity kernels for shorter periods, and sample 
greater depths. The peak sensitivities for the periods used in this study range from 
depths of ~40 km (for 33s) up to ~200 km (for 143s). Thus, the vertical resolu- 
tion is greatest between ~40 km and ~200 km, and decreases gradually with depth 
(below ~300 km, resolution drops below 0.1; Extended Data Fig. 3b). The model 
covariance obtained for phase velocities from the two-plane wave method was 
used as data covariance to regularize the shear wave velocity inversion. The average 
root mean squared (r.m.s.) misfit between predicted and observed phase velocities 
over all periods indicates an average error of ~0.02kms ' (Extended Data 
Fig. 3c). Results of our shear-wave inversions are presented in Fig. 2 and 
Extended Data Fig. 5. 

Lateral and vertical resolution. The main new features observed in this study 
from the surface-wave tomography include the far inboard extent of flat slab along 
the subducting Nazca ridge, and the slab tear north of the ridge. We performed a 
range of tests to investigate lateral and vertical resolution to ensure the robustness 
of these features (see also Extended Data Figs 3, 6-10). 

Lateral resolution. We plot the resolution matrix rows of isolated model para- 
meters for several periods, with an emphasis on the spatial resolution at three 
locations along the northern profile north of the subducting Nazca Ridge: one 
where we observe re-steepening of the slab, one at the slab tear, and one along the 
flat-slab remnant. We also investigate points at two locations along the subducting 


Nazca Ridge: one where we observe the far inboard extent of the flat slab (‘long flat 
slab’), and one where previous studies'* suggest the end of flat slab should be 
(‘short flat slab’). The examination of our resolution matrix for these five selected 
nodes is intended primarily to demonstrate that we have sufficient spatial resolu- 
tion to resolve the slab tear north of the ridge and the inboard extent of flat slab 
along the ridge. We focus on intermediate periods because they have peak sens- 
itivity at the most relevant depths (Extended Data Fig. 2c). The tests show that 
these model parameters are able to resolve spatial-scale features smaller than those 
discussed here. The only node for which we observe a particularly broad sensitivity 
cone is the one at the far inboard extent of the flat slab. This finding suggests that, 
while the inboard extent of the flat slab may not be as well resolved as in other 
locations, a shorter flat slab would have been imaged accurately if it did exist. Our 
inboard extent is therefore a conservative estimate. 

To demonstrate the sensitivity of our results to grid node spacing, we plot phase- 
velocity maps for intermediate periods using 0.25° and 0.5° grid node spacing. The 
phase velocity maps with 0.5° grid node spacing show major features that are 
smoother than, but consistent with, major features that are observed on maps with 
0.25° spacing. Further, the dispersion curves for the five selected nodes reflect 
consistency regardless of the grid node spacing. Along the northern profile in both 
cases we observe faster anomalies at 66s and 77s, where we observe the rest- 
eepened slab, slow anomalies at all intermediate periods where we observe the 
slab tear, and fast anomalies where we observe the flat-slab remnant. Along the 
flat-slab profile we note low phase velocities at the location where previous studies 
suggest a resumption of the steep slab, and high phase velocities at the location 
which we propose to be the end of flat slab. 

We perform a series of checkerboard tests using the surface wave resolution 

matrices to test the size of the anomalies that can be recovered with the varying 
periods used here (Extended Data Fig. 6). These tests show whether we have 
sufficient spatial resolution to recover the size of the anomaly analogous to the 
observed tear and whether we have sufficient resolution to resolve the inboard 
extent of the flat slab. For this reason we plot the five selected nodes. In addition, 
these tests yield a better understanding of the spatial resolution of phase velocity 
maps across the study area and easily reveal areas that suffer from smearing 
(because of preferential ray path direction and/or lack of data). Short and inter- 
mediate periods, with peak sensitivities between 50-km and 150-km depth, are 
able to recover smaller anomalies, equal to and smaller than the lateral extent of the 
observed slab tear. The tests show that we do have sufficient spatial resolution to 
resolve the slab tear, flat slab remnant to the east, and re-steepened slab to the west. 
Longer periods, which mostly sample subslab material, can recover slightly larger 
features. However, both shorter and longer periods are able to resolve the size of 
the anomaly analogous to subducting slab at the end of the flat slab. These checker- 
board tests show that we are able to resolve anomalies where previous studies 
suggested the end of flat slab, while the node representing the far inboard extent of 
flat slab may be streaked because of a lack of crossing rays. Thus, on the basis of 
these tests, we can conclude with confidence that the inboard extent of flat slab 
along the subducting Nazca Ridge is not where previously assumed, but further 
inboard. Resolution at the location representing the far inboard extent of flat slab is 
weak and suffers from smearing. However, our conclusion on the far inboard 
extent of flat slab is supported by constraints from other studies***’. 
Vertical resolution. We test our vertical resolution along all profiles shown in Fig. 2. 
Extended Data Figures 7-10 show recovery tests for: the southern profile, where 
we observe steeply dipping slab (Extended Data Fig. 7); the flat-slab segment along 
the Nazca Ridge (Extended Data Fig. 8); just north of the Nazca Ridge, where we 
observe deepening of earthquakes and the start of slab tear (Extended Data Fig. 9); 
and the northern profile, where we observe the slab tear, re-steepening of the slab 
and flat-slab remnant (Extended Data Fig. 10). 

Extended Data Fig. 7 demonstrates our ability to recover a dipping slab south of 
the ridge. We model a shear wave velocity structure with a 70-km-thick steeply 
dipping slab associated with a velocity of 4.6 km s ' (Extended Data Fig. 7d). This 
model is based on our interpretations of the observed structures that we show in 
Fig. 2d. We predict dispersion curves for this model using the code of ref. 32, add 
noise to predicted phase velocities, and invert them using the same starting model 
(Extended Data Fig. 7b) and regularization parameters as for the model shown in 
Extended Data Fig. 7c. The Gaussian noise was generated from misfits obtained in 
our final model using the central limit theorem method, and randomly assigned to 
predicted phase velocities. We were able to recover the steeply dipping structure, 
but its thickness appears greater owing to vertical smearing. We were not able to 
recover the full amplitude of the anomaly, but a somewhat lower amplitude (4.45- 
4.55kms_ '). Our model calculated using observed data (Extended Data Fig. 7c) 
indicates shear-wave velocities above 4.55 kms" *. This recovery test suggests that, 
in order to fully recover the amplitude of observed high shear-wave velocities, 
either the slab in Extended Data Fig. 7d needs to be associated with velocities 
greater than 4.6kms’, or the thickness of the slab should be greater, or both. 
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Extended Data Figure 8 demonstrates our ability to differentiate between a flat 
slab with our (greater) inboard extent along the Nazca ridge track (‘long flat slab’) 
and a flat slab with shorter extent, suggested previously” (Extended Data Fig. 8g, 
‘short flat slab’). The plots in Extended Data Fig. 8h and i show recovered models. 
The tests show that we are able to recover the flat-slab-related high shear-wave 
velocities. However, we observe vertical smearing, and notice that, in the recovered 
model, slab-related high velocities appear at shallower depths, resulting in high 
velocities in the lower crust and more shallow flat slab. This is also noticeable in 
our model shown in Extended Data Fig. 8e. Owing to vertical smearing and the 
gradational nature of the slab-mantle boundary in oceanic plates, the bottom of 
the slab is poorly resolved. Different layer discretization owing to different crustal 
thicknesses causes the artificial undulating nature of the slab’s positive anomaly, 
also present in our model (Extended Data Fig. 8e). The plots in Extended Data Fig. 
8h, i demonstrate sufficient vertical resolution to recover the end of the flat slab. 
We also plot dispersion curves at two points, representing the shorter end of flat 
slab previously suggested’ (Extended Data Fig. 8b) and the far inboard end 
(Extended Data Fig. 8c). Dispersion curves predicted for shorter and longer flat 
slab are substantially different and the observed phase velocities match better with 
the longer flat slab. 

Extended Data Figure 9 demonstrates our ability to recover a torn slab to the 
north of the Nazca ridge. Extended Data Figure 10 demonstrates our ability to 
distinguish between torn slab and a continuous slab along the northernmost 
profile. Extended Data Figure 10i, j shows that lateral heterogeneities and dipping 
structures are well recovered (except at shallower depths, where we lose resolution; 
Extended Data Fig. 3b). Again, we are able to recover the flat slab, but with evident 
vertical smearing. The observed dispersion curves at locations at which we observe 
the re-steepened slab (point 1), torn slab (point 2) and flat-slab remnant (point 3) 
are very different. Shorter periods of the torn slab model at point 1 are character- 
ized with low phase velocities, while intermediate periods have much higher 
phase velocities. In contrast, the continuous slab model is associated with high 
phase velocities at both short and intermediate periods. At point 2 both short 
and intermediate periods show low phase velocities for the torn slab model, but 
high phase velocities for the continuous slab model. At point 3 both short and 
intermediate periods are associated with high phase velocities for both torn and 


LETTER 


continuous flat slab. Generally, we are able to reproduce the observed dispersion 
curves with our model of torn slab (Fig. 2d), except for the low phase velocities at 
shorter periods at point 1. This is because we did not introduce low shear velocities 
in the lower crust in our starting model. Dispersion curves for the continuous flat 
slab differ from those observed at points 1 and 2, especially at intermediate periods 
that sample upper mantle material. 
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Extended Data Figure 1 | Relocated earthquakes used in this study. Colours show the depth of events (in km) and lines indicate the slab contours in 20-km 
depth increments. Events below 130 km are shown in black. 
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Extended Data Figure 2 | Events, grid and starting model used for the 
Rayleigh wave phase velocity inversions. a, Teleseismic events used in the 
study. b, Black diamonds represent grid nodes; blue diamonds represent 
comers used in the two-plane wave methodology; red circles are PULSE 
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stations; orange circles are CAUGHT stations; yellow circles are PERUSE 
stations; and yellow star is the permanent NNA station. c, Sensitivity kernels for 
periods used in our study with the one-dimensional starting shear wave 
velocity model. 
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Extended Data Figure 3 | Lateral and vertical resolution. a, Resolution for 
the 40-s and 58-s periods. The resolution matrix diagonal for Rayleigh wave 


phase velocities is indicated in grey scale. Red triangles are Holocene volcanoes. 


Black rectangles, circles and stars are stations used here. b, Resolution 


matrix diagonal values for all one-dimensional shear wave velocity inversions. 


Colours of the circles indicate average values of the resolution matrix diagonals 
for each layer. c, The r.m.s. average misfit over all periods at each point after 
our shear wave inversions. Colours represent the misfit in km s_'. Black 
rectangles represent stations used in our study. 
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Extended Data Figure 4 | Calculated Rayleigh wave phase velocities for 40-s, 50-s, 58-s, 66-s, 77-s and 91-s periods. Colours and contours indicate absolute 
phase velocities. Red triangles represent Holocene volcanoes. Black rectangles, circles and stars represent stations used in our study. 
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95 km, 125 km and 165 km. Colours represent velocity deviations with respect _ relocated using HypoDD™. Red triangles represent Holocene volcanoes. 
to the reference model (Extended Data Fig. 2c); contours show absolute Black rectangles, circles and stars are stations used in our study. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Starting Anomaly 


Recovered for 45s 


TI = = © ry 
( “| 


| dviv (%) 
- -4 -3 -2 -1 0 1 2 3 4 § 


Extended Data Figure 6 | Checkerboard tests estimated from resolution stars along the subducting Nazca Ridge refer to locations of our (greater) 
matrix for 45 s, 58 s and 77 s. Colours represent the recovered anomaly. inboard extent along the Nazca ridge track and a flat slab with shorter extent 
Yellow stars along the northernmost profile indicate locations where we suggested previously”. 
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Extended Data Figure 8 | Recovery tests for the flat-slab segment. 

a, Transect. b, Dispersion curve at a location representing the shorter end of flat 
slab previously suggested’”. c, Dispersion curve at a location representing the 
greater inboard extent of flat slab (proposed here); error bars represent one 


standard deviation of uncertainty. d, Starting model. e, Model calculated 
using observed data. f, Model with our (greater) inboard extent of flat slab. 
g, Model with shorter flat slab as suggested previously’’. h, Recovered model 
from f. i, Recovered model from g. 
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Extended Data Figure 9 | Recovery tests for the area just north of the Nazca 
(see Fig. 2d). e, Recovered model. 
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Extended Data Figure 10 | Recovery tests for the northern profile where 
we observe the slab tear, a re-steepening of the currently subducting slab 
west of the tear, and the flat slab remnant east of the tear. a, Transect. 
b, Dispersion curve at the location representing the re-steepened slab; error 
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the flat slab remnant. e, Starting model. f, Model calculated using observed 
data. g, Model with slab tear that we propose in this study. h, Model with 
continuous flat slab suggested previously suggested in ref. 12 and other studies. 
i, Recovered model from g. j, Recovered model from h. See Methods for further 
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An early modern human from Romania with a recent 


Neanderthal ancestor 


Qiaomei Fub?3% Mateja Hajdinjak**, Oana Teodora Moldovan‘, Silviu Constantin’, Swapan Mallick”®’, Pontus Skoglund’, 
Nick Patterson®, Nadin Rohland?, Iosif Lazaridis?, Birgit Nickel’, Bence Viola®*”’*, Kay Priifer?, Matthias Meyer’, Janet Kelso’, 


David Reich”? & Svante Paibo? 


Neanderthals are thought to have disappeared in Europe approxi- 
mately 39,000-41,000 years ago but they have contributed 1-3% of 
the DNA of present-day people in Eurasia’. Here we analyse DNA 
from a 37,000-42,000-year-old” modern human from Pestera cu 
Oase, Romania. Although the specimen contains small amounts of 
human DNA, we use an enrichment strategy to isolate sites that are 
informative about its relationship to Neanderthals and present- 
day humans. We find that on the order of 6-9% of the genome of 
the Oase individual is derived from Neanderthals, more than any 
other modern human sequenced to date. Three chromosomal seg- 
ments of Neanderthal ancestry are over 50 centimorgans in size, 
indicating that this individual had a Neanderthal ancestor as 
recently as four to six generations back. However, the Oase indi- 
vidual does not share more alleles with later Europeans than with 
East Asians, suggesting that the Oase population did not contrib- 
ute substantially to later humans in Europe. 

Between 45,000 and 35,000 years ago, anatomically modern 
humans spread across Europe, while the Neanderthals, present since 
before 300,000 years ago, disappeared. How this process occurred 
has long been debated’*°. Comparisons between the Neanderthal 
genome and the genomes of present-day humans have shown that 
Neanderthals contributed approximately 1-3% of the genomes of all 
people living today outside sub-Saharan Africa®’ suggesting that 
human populations ancestral to all non-Africans mixed with 
Neanderthals. The size of segments of Neanderthal ancestry in 
present-day humans suggests that this occurred between 37,000 
and 86,000 years ago’. However, where and how often this occurred 
is not understood. For example, Neanderthals share more alleles 
with East Asians and Native Americans than with Europeans, which 
may reflect additional interbreeding in the ancestors of eastern non- 
Africans’*. Surprisingly, analyses of present-day genomes have not 
yielded any evidence that Neanderthals mixed with modern humans 
in Europe, despite the fact that Neanderthals were numerous 
there and cultural interactions between the two groups have been 
proposed’**, 

More direct insight into the interactions between modern and 
archaic humans can be obtained by studying genomes from modern 
humans who lived at a time when they could have met Neanderthals. 
Recent analyses of genomes from a ~43,000-47,000-year-old modern 
human from western Siberia’* and a ~36,000-39,000-year-old mod- 
ern human from eastern Europe’* showed that Neanderthal gene flow 
into modern humans occurred before these individuals lived. The 
Siberian individual’s genome contained some segments of 
Neanderthal ancestry as large as 6 million base pairs (bp), suggesting 
that some Neanderthal gene flow could have occurred a few thousand 
years before his death’”. 


We report genome-wide data from a modern human mandible, 
Oase 1, found in 2002 in the Pestera cu Oase, Romania. The age of 
this specimen has been estimated to be ~37,000-42,000 years by direct 
radiocarbon dating”’”"®. Oase 1 is therefore one of the earliest modern 
humans in Europe. Its morphology is generally modern but some 
aspects are consistent with Neanderthal ancestry’? *'. Subsequent 
excavations uncovered a cranium from another, probably contempor- 
aneous individual, Oase 2, which also carries morphological traits that 
could reflect admixture with Neanderthals'”””. 

We prepared two DNA extracts from 25 mg and 10 mg of bone 
powder removed from the inferior right ramus of Oase 1. We treated 
an aliquot of each of these extracts with Escherichia coli uracil-DNA 
glycosylase (UDG), an enzyme that removes uracils from the interior 
parts of DNA molecules, but leaves a proportion of uracils at the ends 
of the molecules unaffected. Uracil residues occur in DNA molecules 
as a result of deamination of cytosine residues, and are particularly 
prevalent at the ends of ancient DNA molecules””. Among the DNA 
fragments sequenced from these two extracts, 0.18% and 0.06%, 
respectively, could be mapped to the human reference genome. We 
prepared three additional DNA libraries from the extract containing 
0.18% human-like molecules, but omitted the UDG treatment to 
increase the number of molecules in which terminal C-to-T substitu- 
tions could be seen and used to identify putatively ancient fragments. 
Because the fraction of endogenous DNA is so small, we used hybrid- 
ization to DNA probes to isolate human DNA fragments from the 
libraries”. Applying this strategy to the mitochondrial genome 
allowed the mitochondrial (mt)DNA from the five libraries to be 
sequenced to an average coverage of 803-fold (Supplementary Note 1). 
At the 3’ ends of the DNA fragments, cytosine residues appeared as 
thymine residues relative to the human mtDNA reference in 21% of 
fragments, reflecting appreciable levels of cytosine deamination. This 
suggests that at least some of the human mtDNA is of ancient origin. 
We determined mtDNA consensus sequences in two ways: using all 
mtDNA fragments, and using only deaminated fragments that carry 
C-to-T substitutions at either end relative to the consensus mtDNA 
sequence based on these fragments, an approach known to enrich 
for endogenous DNA’**”*. The mtDNA sequence based on all frag- 
ments clusters with present-day Europeans (Extended Data Fig. 1) 
(Supplementary Note 1). In contrast, the mtDNA sequence based on 
deaminated fragments is related to a large group of present-day 
Eurasian mtDNAs (haplogroup N) but diverges from these before 
they diverged from each other. This Oase 1 mtDNA carries a few 
private mutations on the basis of which its age can be estimated to 
be 36,330 years before present (14,520-56,450; 95% confidence inter- 
val). Using six positions at which the mtDNA sequence differs from at 
least 99% of 311 present-day humans, we estimate the contamination 
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Figure 1 | Allele sharing between the Oase 1 individual and other genomes. 
Each point indicates the extent to which the Oase 1 genome shares alleles with 
one or other of a pair of genomes from different populations indicated above 
and below (see Extended Data Table 1 for numbers). Z-scores with an absolute 
value greater than 2 indicate an excess of allele sharing (grey). 


among all mtDNA fragments to be 67% (95% confidence interval 
65-69%). When we restrict to mtDNA fragments that carry terminal 
C-to-T substitutions, the contamination estimate is 4% (95% confid- 
ence interval of 2-9%) (Supplementary Note 1). 

To isolate nuclear DNA from Oase 1, we used three sets of oligo- 
nucleotide probes that cover about two million sites that are single 
nucleotide polymorphisms (SNPs) in present-day humans and cap- 
tured DNA molecules from the five libraries. Of the SNPs targeted, 
51% (n = 1,038,619) were covered by at least one DNA fragment, and 
13% (n = 271,326) were covered by at least one fragment with a 
terminal C-to-T substitution. To estimate nuclear DNA contamina- 
tion, we tested whether Oase 1 DNA fragments with or without evid- 
ence of deamination share more alleles with present-day Europeans or 
with East Asians. We found that Europeans share significantly fewer 
alleles with Oase 1 fragments that are deaminated than with Oase 1 
fragments that are not, consistent with European contamination of 
17-30% (Supplementary Note 1). On the basis of these findings and 
those from mtDNA, we restricted all subsequent analyses to DNA 
fragments that carry terminal C-to-T substitutions. After doing this, 
we found that we captured targeted SNPs from the X and Y chromo- 
somes at a similar rate, indicating that Oase 1 carried both an X anda 
Y chromosome and thus that he was male. The Y chromosome alleles 
belong to the F haplogroup, which is carried by most males in Eurasia 
today (Supplementary Note 2). 

To determine the relationship of the Oase 1 individual to present- 
day populations, we first tested whether he shared more alleles with 
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particular present-day individuals from different populations using 
D-statistics, which provides a robust estimate of admixture almost 
regardless of how SNPs for analysis are chosen’’. We find that 
Oase 1 shared more alleles with present-day East Asians and Native 
Americans than with present-day Europeans, counter to what might 
naively be expected for an ancient individual from Europe (Fig. 1) 
(5.2 = |Z| = 6.4; Extended Data Table 1). However, it has been sug- 
gested that Europeans after the introduction of agriculture derive a 
part of their ancestry from a ‘basal Eurasian’ population that separated 
from the initial settlers of Europe and Asia before they split from 
each other*®. Therefore, we replaced present-day Europeans with 
Palaeolithic and Mesolithic European individuals in these analyses. 
We then find that the Oase 1 individual shares equally many alleles 
with these early Europeans as with present-day East Asians and Native 
Americans (Fig. 1) (|Z| = 1.5 in Extended Data Table 1). Restricting this 
analysis to transversion polymorphisms, which are not susceptible to 
errors induced by cytosine deamination, does not influence this result 
(Extended Data Table 2 and Supplementary Note 3). This suggests that 
the Oase 1 individual belonged to a population that did not contribute 
much, or not at all, to later Europeans. This contrasts, for example, with 
the ~36,000-39,000-year-old Kostenki 14 individual from western 
Russia, who was more closely related to later Europeans than to East 
Asians (1.9 = |Z| = 13.7; Extended Data Table 1)’*. 

To assess whether the ancestors of the Oase 1 individual mixed with 
Neanderthals, we tested whether the Altai Neanderthal genome shares 
more alleles with the Oase 1 genome than with sub-Saharan Africans. 
We find this to be the case (|Z| = 7.7; Supplementary Note 4). We then 
asked whether the amount of Neanderthal ancestry in the Oase 1 
genome is similar to that in present-day non-Africans. Surprisingly, 
the Neanderthal genome shares more alleles with the Oase 1 individual 
than it does with any present-day people in Eurasia that we tested, 
indicating that he carries more Neanderthal-like DNA than present- 
day people (5.0 = |Z| = 8.2; Extended Data Table 3). We also observe 
more Neanderthal-like alleles in the Oase 1 individual when we com- 
pare him to four early modern humans: an 8,000-year-old individual 
from Luxembourg, and three individuals from Russia who vary in age 
between 24,000 and 45,000 years (3.6 = |Z| = 6.8; Extended Data 
Table 3). Thus, the Oase 1 individual appears to have carried more 
Neanderthal-like DNA than any other modern human analysed to 
date. This observation cannot be explained by residual present-day 
human contamination among the DNA fragments that carry terminal 
C-to-T substitutions, because all modern humans studied to date carry 
less Neanderthal ancestry than the Oase 1 genome, and thus contam- 
ination would lower, rather than increase, the apparent Neanderthal 
ancestry. 

We estimated the proportion of Neanderthal DNA in the Oase 1 
genome using three different statistics’?? (Supplementary Note 4). 
Although the results differ, they all yield point estimates between 
6.0% and 9.4% (Table 1). For one of the statistics, none of the 90% 
confidence intervals for Neanderthal ancestry in the other modern 


Table 1 | Estimated fraction of the Oase 1 genome that derives from Neanderthals 


Statistic 1 
f,(Denisova, Altai; Mbuti, X) 


f4(Denisova, Altai; Mbuti, Mezmaiskaya) 


Sample Proportion s.e.m. 90% Cl Proportion 
Oase 1 8.1% 2.0% 4.8-11.3% 9.4% 
Ust’-Ishim 3.6% 0.9% 2.2-5.0% 5.5% 
Kostenki 14 3.8% 1.0% 2.1-5.5% 2.9% 
MA1 1.2% 1.1% 0.0-3.0% 3.5% 
Loschbour 1.3% 0.9% 0.0-2.8% 3.9% 
La Brafa 3.1% 1.0% 1.4-4.7% 1.9% 
Stuttgart 3.0% 0.9% 1.5-4.4% 2.5% 
Han 2.2% 0.9% 0.6-3.7% 2.2% 
Dai 2.6% 0.9% 1.1-4.0% 1.0% 
French 3.0% 0.9% 1.6-4.5% 3.0% 


Cl, confidence interval; s.e.m., standard error of the mean; negative values are truncated to 0%. 


Statistic 2 Statistic 3 
f,(Mbuti, Chimp; X, Denisova) f,(X, Mbuti; Denisova, Chimp) 
~ f,(Mbuti, Chimp; Dinka, Denisova) f,(Altai, Mbuti; Denisova, Chimp) 

s.e.m. 90% Cl Proportion s.e.m. 90% Cl 

1.1% 7.5-11.3% 6.0% 2.0% 2.8-9.3% 
0.7% 4.3-6.6% 0.4% 2% 0.0-2.5% 
0.8% 1.6-4.2% 1.7% 3% 0.0-3.9% 
0.8% 2.2-4.8% 2.3% 3% 0.1-4.5% 
0.7% 2.7-5.1% 0.5% 2% 0.0-2.6% 
0.7% 0.7-3.1% 14% 2% 0.0-3.4% 
0.7% 1.3-3.7% 0.4% 2% 0.0-2.4% 
0.8% 1.0-3.5% 1.0% 2% 0.0-3.1% 
0.8% 0.0-2.3% 0.7% 2% 0.0-2.6% 
0.7% 1.8-4.2% 0.2% 2% 0.0-2.2% 
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Figure 2 | Spatial distribution of alleles matching Neanderthals in modern 
humans. Coloured vertical lines indicate alleles shared with Neanderthals and 
no colour indicates alleles shared with the great majority of West Africans. 


human samples overlap with the confidence interval in Oase 1. When 
we restrict analysis to transversion SNPs, the point estimates of 
Neanderthal ancestry are even higher (range of 8.4% to 11.3%) 
(Extended Data Table 4). 

To study the spatial distribution of Neanderthal DNA across the 
Oase 1 genome, we designed capture probes for around 1.7 million 
nucleotide positions at which nearly all individuals in a sub-Saharan 
African population carry one allele whereas Neanderthal genomes 
carry a different allele. We used these probes to isolate DNA fragments 
from the Oase 1 individual. A total of 78,055 sites were covered by 
deaminated DNA fragments from the Oase 1 individual and were also 
covered by DNA fragments sequenced from the ~36,000-39,000- 
year-old Kostenki 14 individual from western Russia’®, the 
~43,000-47,000-year-old individual from Ust’-Ishim in Siberia’, 
and three present-day human genomes from China, France and 
Sudan (Supplementary Note 5). Because the Dinka from Sudan are 
thought to have little or no Neanderthal ancestry’, we subtracted the 
number of alleles that match the Neanderthals in the Dinka individual 
(485) from the number in the other genomes to estimate the number of 
alleles attributable to Neanderthal ancestry. The resulting numbers of 
putative Neanderthal alleles are 3,746 in the Oase 1 individual, 1,586 
and 1,121 in the Ust’-Ishim and Kostenki 14 individuals, respectively, 
and 1,322 and 1,033 in the Chinese and the European individuals 
(Extended Data Table 5). Thus, the Neanderthal contribution to the 
Oase 1 genome appears to be between 2.3- and 3.6-fold larger than to 
the other genomes analysed. Assuming that the Neanderthal contri- 
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D, Dinka; F, French; H, Han; K, Kostenki 14; O, Oase 1; U, Ust’-Ishim. The 
seven grey bars indicate segments of putative recent Neanderthal ancestry. This 
analysis is based on 78,055 sites. Numbers refer to chromosomes. 


bution to the European individual is 2% (ref. 7), this suggests that 7.3% 
of the Oase 1 genome is of Neanderthal origin. When the numbers of 
alleles matching the Neanderthal genome are compared per chrom- 
osome (Extended Data Table 5), the highest numbers are always 
observed for the Oase 1 genome, except in the case of chromosome 
21, in which the Ust’-Ishim individual carries a large segment of likely 
Neanderthal ancestry. 

We plotted the positions of Neanderthal-like alleles across the 
Oase 1 genome (Fig. 2). We detect three segments that are over 
50 centimorgans (cM) in size, suggesting that the Neanderthal contri- 
bution to the Oase 1 individual occurred so recently in his family tree 
that chromosomal segments of Neanderthal origin had little time to 
break up due to recombination. To estimate the date of the most recent 
Neanderthal contribution to the Oase 1 genome, we studied the size 
spans of seven segments of the genome that appeared to be recently 
derived from Neanderthals. Their genetic lengths suggest that the 
Oase 1 individual had a Neanderthal ancestor as a fourth-, fifth- or 
sixth-degree relative (Supplementary Note 5). This would predict that an 
average of 1.6% to 6.3% of the Oase 1 genome derived from this recent 
Neanderthal ancestor. Visual inspection of the Oase 1 genome sug- 
gests that in addition to these seven segments, other smaller segments 
also carry Neanderthal-like alleles (Fig. 2). When we remove the seven 
longest segments, the estimate of Neanderthal ancestry in Oase 1 drops 
from 7.3% to 4.8%, which is still around twice the 2.0-2.9% estimated 
for the French, Han, Kostenki and Ust’-Ishim individuals in this 
remaining part of the genome. This additional Neanderthal ancestry 
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could reflect an older Neanderthal admixture into the ancestors of Oase 
1, or that we failed to find all segments of recent Neanderthal ancestry. 
The Oase 1 genome shows that mixture between modern humans 
and Neanderthals was not limited to the first ancestors of present-day 
people to leave Africa, or to people in the Near East; it occurred later as 
well and probably in Europe. The fact that the Oase 1 individual had a 
Neanderthal ancestor removed by only four to six generations allows 
this Neanderthal admixture to be dated to less than 200 years before 
the time he lived. However, the absence of a clear relationship of the 
Oase 1 individual to later modern humans in Europe suggests that he 
may have been a member of an initial early modern human population 
that interbred with Neanderthals but did not contribute much to later 
European populations. To better understand the interactions between 
early modern and Neanderthal populations, it will be important to 
study other specimens that, like Oase 1, have been suggested to carry 
morphological traits suggestive of admixture with Neanderthals”. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


DNA extraction and library preparation. We used a dentistry drill to remove 
two samples of bone powder from an area where a larger sample had previously 
been removed for carbon dating”. We prepared two extracts (E1406, E1843) from 
25 mg and 10 mg of bone powder, respectively, as described*'. We produced five 
libraries from the two extracts using a single-stranded library protocol?” 
(Extended Data Table 6). We treated one library from each extract (A5227, 
A5252) with E. coli uracil-DNA glycosylase (UDG) and endonuclease VIII to 
remove deaminated cytosine residues from the interior parts of molecules**. We 
amplified all libraries by PCR for 35 cycles using AccuPrime Pfx DNA polymerase 
(Life Technologies)** and primers carrying library-specific indexes**. We deter- 
mined library concentrations using a NanoDrop 2000 spectrophotometer. 
Sequencing and DNA capture. We shotgun sequenced the UDG-treated libraries 
A5252 and A5227 and found that they contained 0.06% and 0.18% human DNA, 
respectively. We used hybridization to oligonucleotide probes to enrich the lib- 
raries for subsets of the nuclear genome containing panels of known SNPs as 
described”*, except that each SNP was targeted by four 52-nucleotide probes: 
two immediately flanking the SNP on both sides, and two centred on the SNP 
containing one or the other alternate allele, respectively. We used four panels of 
probes. 

Panel 1 “390k”: 394,577 SNPs, about 90% of which are on the Affymetrix 
Human Origins array”’. See ref. 36 for SNPs and probes. 

Panel 2 “840k”: 842,630 SNPs constituting the rest of the SNPs on the Human 
Origins array, all SNPs on the Illumina 610-Quad array, all SNPs on the 
Affymetrix 50k array, and smaller numbers of SNPs chosen for other purposes. 
See Supplementary Data 1. 

Panel 3 “1000k”: 997,780 SNPs comprising all transversion polymorphisms 
seen in two Yoruba males from Nigeria sequenced to high coverage and transver- 
sion polymorphisms seen in the Altai Neanderthal genome. The design was 
restricted to SNPs that passed strict quality filters in the Neanderthal genome 
(Map35_99%)’, and had chimpanzee alleles available. Probes were designed from 
chimpanzee flanking sequences. See Supplementary Data 2. 

Panel 4 “Archaic”: This panel contains SNPs where the West-African Yoruba 
population carry a high frequency of one allele while at least one archaic individual 
carries an alternative allele. To determine Yoruba allele frequencies, we examined 
data from all Yoruba individuals from the 1000 Genomes Project*’ covered by at 
least three sequences passing filters. At these sites we called majority alleles (draw- 
ing a random allele in the case of equal numbers of reads supporting both alleles). 
We furthermore restricted the analysis to sites at which =24 Yoruba individuals as 
well as the Altai Neanderthal and Denisovan genomes had allele calls 
(Map35_50% filter’). We then selected sites at which at most one alternative allele 
is seen among the Yoruba while at least one of four archaic genomes (Denisovan; 
Altai, Vindija and Mezmaiskaya Neanderthals) carry the alternative allele. 
Ancestral states were taken from the inferred ancestor of humans and chimpan- 
zees (Ensembl Compara v.64)***’. We used the following classes of sites. Class 1: 
297,894 SNPs where Yoruba is derived and at least one ancestral allele is seen in the 
Altai, Vindija, Mezmaiskaya or Denisova genomes. Class 2: sites where Yoruba 
alleles are all or nearly all ancestral and derived alleles are seen in archaic genomes. 
Since such derived alleles often arise due to errors in an archaic genome, we 
restricted this class to the following three cases: (1) 1,321,774 SNPs where the 
high-coverage Altai Neandertal and/or Denisova genomes are homozygous 
derived; (2) 523,041 SNPs where the Altai and/or Denisova genomes are hetero- 
zygous but are not C-to-T or G-to-A substitutions relative to the ancestral allele; 
and (3) 30,735 SNPs that are homozygous ancestral in Altai and/or Denisova and 
at least one copy of the derived allele is observed in the Mezmaiskaya or Vindija 
Neanderthal genomes, and the derived allele represents a transversion that is also 
seen in the Simons Genome Diversity Panel (https://www.simonsfoundation.org/ 
life-sciences/simons-genome-diversity-project/). After eliminating SNPs where 
capture probes covered ambiguous bases in the human (hg19) and chimpanzee 
(pantro2) genomes or overlapped for less than 35 nucleotides with mapable 
regions (Map35_50%)’, this left us with a set of 1,749,385 SNPs (see 
Supplementary Data 3). 

Sequencing of capture products and data processing. We sequenced capture 
products using 2 X 75 bp reads on an Illumina HiSeq2500 or an Illumina 
NextSeq500. We de-multiplexed the reads allowing one mismatch in each of the 


two indices (Extended Data Table 6), and merged paired reads into sequenced 
fragments requiring an overlap of at least 15 bp (allowing one mismatch) using a 
modified form of SeqPrep (https://github.com/jstjohn/SeqPrep). We used the 
bases with the higher quality (and score) to represent the overlap region. After 
removing adapters, we mapped merged fragments to hg19 using BWA (v.0.6.1) 
using the ‘samse’ command. We identified duplicated fragments on the basis of 
sharing the same orientation and end positions, in which case we kept the frag- 
ment with the highest quality (Extended Data Table 7). 

To focus on putatively deaminated fragments we used fragments with C-to-T 
substitutions relative to the hg19 human genome reference sequence in the first 5’ 
or last two 3’ bases for the UDG-treated libraries, and to fragments with C-to-T 
substitutions relative to hg19 in the terminal three bases at either end of frag- 
ments from non-UDG-treated libraries (Supplementary Note 1 and Extended 
Data Table 8). 

Merging the Oase 1 data with genome sequences. At each SNP covered at least 
once in Oase 1, we selected the majority allele (in case of a tie, we picked a random 
allele). We then merged the Oase 1 data with 25 genomes of present-day humans 
sequenced to 24-42 coverage’: the Altai Neanderthal’, the Siberian Denisovan’, 
a ~45,000-year-old modern human from Ust’-Ishim in Siberia’, an ~8,000-year- 
old Mesolithic individual from Loschbour Cave, Luxembourg”, and a ~7,000- 
year-old early farmer from Stuttgart, Germany”* (Extended Data Table 9). All the 
genotype calls for the five deeply sequenced ancient genomes were performed in 
the same way. We restricted analyses to sites with a minimum root-mean-square 
mapping quality (MAPQ) of 30 in the 30 genomes. We added lower coverage 
shotgun data from the ~36,000-year-old Kostenki 14 from Russia’®, the ~24,000- 
year-old Mal’ta Siberian individual from Russia”, an 8,000-year-old Mesolithic 
individual from La Brafia Cave, Spain*', a Neanderthal from Mezmaiskaya in 
Russia’, and a pool of three Neanderthals from Vindija Cave in Croatia®. For these 
samples, we restricted to fragments with a map quality of MAPQ = 37 to match 
the filter for the low-coverage Oase 1 data (Extended Data Table 9). 

Population genetic analyses. To determine the relationship of Oase 1 to other 
modern humans, we used D-statistics to evaluate whether sets of four tested 
samples are consistent with being related to one another according to an unrooted 
tree’” (Supplementary Note 3). We used D-statistics and f,-statistic ratios” to test 
both whether there is excess archaic ancestry in Oase 1 compared with other 
modern humans, and to estimate proportions of Neanderthal ancestry”’ 
(Supplementary Note 4). We studied the genomic distribution of alleles that are 
likely to derive from Neanderthals in the sense of being shared with Neanderthal 
but either absent or at very low frequency in West Africans. We used the spatial 
distribution of these sites to identify stretches of likely Neanderthal ancestry in 
several individuals including Oase 1. We also used these data to estimate the 
number of generations since the most recent Neanderthal ancestor of Oase 1 
(Supplementary Note 5). 
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Extended Data Figure 1 | Mitochondrial DNA tree for Oase 1 and other modern humans. The consensus sequences for all Oase 1 fragments and for 
deaminated fragments are shown. The tree is rooted with a Neanderthal mtDNA (Vindija33.25). 
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Extended Data Table 1 


Allele sharing between early modern humans and other humans 


Oase 1 Ust’-Ishim Kostenki 14 
Non-African, Non-Africanz D Z D Z D Z 
Oase | Ust'-Ishim -0.0033 -3.8 
Oase | Kostenki 14 -0.0037 -4.1 
Oase | MAI -0.0032 -3.5 -0.0092 -9.8 
Oase | Loschbour -0.0032 -3.9 -0.0101 -12.2 
Oase | East Asian -0.0027 -3.8 -0.0011 -1.6 
Oase | Native American -0.0030 -4.1 -0.0039 -5.5 
Ust’-Ishim Kostenki 14 -0.0005 -0.6 
Ust’-Ishim MAI -0.0007 -0.8 -0.0059 -6.4 
Ust’-Ishim Loschbour 0.0002 0.3 -0.0068 -8.5 
Ust’-Ishim East Asian 0.0000 -0.1 0.0022 3.3 
Ust’-Ishim Native American -0.0007 -1.0 -0.0006 -0.8 
Kostenki 14 MAI -0.0004 -0.6 0.0003 0.4 
Kostenki 14 Loschbour 0.0007 1.0 0.0006 0.8 
Kostenki 14 East Asian 0.0004 0.6 0.0011 1.6 
Kostenki 14 Native American -0.0002 -0.3 0.0008 1.1 
MAI Loschbour 0.0012 1.7 0.0005 0.7 -0.0012 -1.5 
MAI East Asian 0.0008 1.2 0.0007 1.1 0.0079 10.6 
MAI Native American 0.0001 0.1 0.0004 0.6 0.0051 7.0 
Loschbour East Asian -0.0002 -0.4 0.0005 0.9 0.0090 13.7 
Loschbour Native American -0.0009 -1.5 0.0002 0.3 0.0062 9.0 
East Asian Native American -0.0006 -1.6 -0.0003 -0.8 -0.0028 -6.6 
European Oase | 0.0004 0.6 0.0049 7.3 
European Ust'-Ishim -0.0023 -3.5 0.0016 2.4 
European Kostenki 14 -0.0028 -4.7 -0.0033 -5.1 
European MAI -0.0033 -5.4 -0.0031 -5.1 -0.0041 -6.0 
European Loschbour -0.0021 -4.5 -0.0027 -5.7 -0.0052 -9.1 
European East Asian -0.0024 -5.2 -0.0022 -5.3 0.0039 9.2 
European Native American -0.0030 -6.4 -0.0025 -5.9 0.0010 2.2 
European Stuttgart -0.0007 -1.5 -0.0001 -0.2 -0.0002 -0.3 
Stuttgart Oase 1 0.0005 0.6 0.0051 6.7 
Stuttgart Ust'-Ishim -0.0017 -2.3 0.0018 2.3 
Stuttgart Kostenki 14 -0.0021 -3.2 -0.0032 -4.6 
Stuttgart MAI -0.0027 -3.9 -0.0029 -4.2 -0.0041 -5.0 
Stuttgart Loschbour -0.0015 -2.4 -0.0027 -4.6 -0.0050 -7.5 
Stuttgart East Asian -0.0017 -2.9 -0.0022 -3.8 0.0040 6.8 
Stuttgart Native American -0.0024 -3.9 -0.0025 -4.4 0.0012 1.9 


We compute D(Non-African,, Non-Africanz; Early Modern Human, African) to test whether an early modern human (Oase 1, Ust’-Ishim, or Kostenki 14) shares more alleles with Non-African, (in which case the 
statistic is positive) or Non-Africanz (negative). We use a pool of six sub-Saharan African genomes (2 Mbuti, 2 Yoruba, 2 Dinka) as an outgroup; a pool of four genomes (2 French, 2 Sardinians) to represent 
Europeans; a pool of four genomes (2 Han, 2 Dai) to represent East Asians; and a pool of three genomes (2 Karitiana, 1 Mixe) to represent Native Americans. Results are based on 242,122 transition and 
transversion SNPs covered by at least one deaminated fragment in Oase 1, and covered in all other samples, although not necessarily MA1. For analyses involving MA1, a subset of 176,569 transversion SNPs was 


analysed. 
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Extended Data Table 2 | Allele sharing between early modern humans and other humans (transversions only) 
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Oase 1 Ust’-Ishim Kostenki 14 
D Z D Z D Z 


-2.1 


-6.5 
-8.8 


We compute D(Non-African,, Non-Africanz; Early Modern Human, African), to test whether an early modern human (Oase 1, Ust’-Ishim or Kostenki 14) shares more alleles with Non-African, (in which case the 
statistic is positive) or Non-Africanz (negative). We use a pool of six sub-Saharan African genomes (2 Mbuti, 2 Yoruba, 2 Dinka) as an outgroup; a pool of four genomes (2 French, 2 Sardinians) to represent 

Europeans; a pool of four genomes (2 Han, 2 Dai) to represent East Asians; and a pool of three genomes (2 Karitiana, 1 Mixe) to represent Native Americans. Statistics are as in Extended Data Table 1 but are based 
on 106,004 transversion SNPs covered by at least one deaminated fragment in Oase 1 and that also have coverage for all other samples, although not necessarily MA1. For analyses involving MA1, a subset of 


76,715 transversion SNPs is analysed. 
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Extended Data Table 3 


Testing whether archaic genomes share more alleles with Oase 1 than with other modern humans 


Archaic = Altai Archaic = Denisovan 
Chimp Mbuti Chimp Mbuti 

Test Sites D Z D Z D Z D Z 

Han 115,300 -0.0036 -5.1 -0.0071  -7.6 -0.0014 -2.2 -0.0049 -6.3 
Dai 115,300 -0.0035  -5.0 -0.0077  -8.2 -0.0013 -2.1 -0.0056 -7.0 
Karitiana 115,300 -0.0032 -4.3 -0.0063  -6.9 -0.0008 -1.3 -0.0040  -5.3 
French 115,300 -0.0049 -6.9 -0.0074 -8.2 -0.0021 -3.4 -0.0047  -6.2 
Sardinian 115,300 -0.0038  -5.1 -0.0071  -7.8 -0.0016 -2.5 -0.0050  -6.5 
Papuan 115,300 -0.0026 -3.6 -0.0051 -5.4 0.0009 1.5 -0.0016 -2.1 
Ust’-Ishim 115,100 -0.0026 -3.6 -0.0052  -5.5 -0.0009  -1.5 -0.0035 -4.4 
Kostenkil4 108,100 -0.0032 -4.1 -0.0059  -6.0 -0.0017 -2.4 -0.0044  -5.3 
MAI 83,200 -0.0031 -3.6 -0.0050  -4.7 -0.0007 -0.9 -0.0028 -2.8 
Loschbour 114,300 -0.0043 -5.7 -0.0066 -6.8 -0.0019 -2.9 -0.0043  -5.3 
LaBrana 111,000 -0.0033 -4.2 -0.0072  -7.3 -0.0008 -1.2 -0.0047 -5.4 
Stuttgart 114,000 -0.0037 -5.1 -0.0066 -7.1 -0.0013 -2.1 -0.0042 -5.6 


The statistic D(Test, Oase 1; Archaic, Outgroup) is negative if the archaic genomes share more alleles with Oase 1 than with a test sample. The outgroups are either chimpanzee or a sub-Saharan African (Mbuti). 
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Extended Data Table 4 | Estimated fraction of the Oase 1 genome that derives from Neanderthals 


fa(Denisova,Altai;Mbuti,x) fa(Mbuti,Chimp;X ,Denisova) fa(X,Mbuti;Denisova,Chimp) 


Jaenisova, Altai;mMbuti,Mezmaiskaya) fal Mbuti,chimp;Dinka,Denisova) fa(Altai,Mbuti;penisova,chimp) 
Sample Prop. S.E. 90% CI Prop. S.E. 90% CI Prop. = S.E. 90% CI 
Oase | 11.3% 2.8% 6.7%-16% 10.9% 1.6% 8.3%-13.6% 8.4% 2.7% 4.0%-12.9% 
Ust’-Ishim 2.9% 1.2% 1.0%-4.9% | 6.0% 0.8% 4.7%-7.4% 4.2% 1.5% 1.8%-6.6% 
Kostenki 14 3.0% 1.4% 0.7%-5.3% | 3.0% 0.9% 1.6%-4.5% 6.2% 1.6% 3.6%-8.7% 
MAI 1.5% 1.5% 0.0%-4.0% | 3.6% 1.0% 1.9%-5.2% 5.5% 1.6% 2.8%-8.2% 
Loschbour 1.1% 1.2% 0.0%-3.1% | 4.8% 0.9% 3.3%-6.2% 3.6% 1.5% 1.2%-6.1% 
LaBrana 3.7% 1.3% 1.4%-5.9% | 2.4% 0.9% 0.9%-3.8% 4.8% 1.5% 2.4%-7.2% 
Stuttgart 2.8% 1.2% 0.8%-4.8% | 3.4% 0.9% 2.0%-4.9% 3.8% 1.5% 1.4%-6.2% 
Han 1.0% 1.3% 0.0%-3.1% | 2.8% 0.9% 1.3%-4.2% 3.6% 1.5% 1.2%-6.1% 
2.1% 1.2% 0.2%-4.0% 1.3% 0.9% 0.0%-2.8% 3.8% 1.5% 1.4%-6.2% 
French 1.6% 1.2% 0.0%-3.5% | 3.3% 0.9% 1.9%-4.7% 2.7% 1.5% 0.3%-5.2% 
Sardinian 2.7% 1.2% 0.8%-4.7% | 2.3% 0.9% 0.8%-3.7% 3.7% 1.4% 1.3%-6.1% 


Estimates are as in Table 1 but restrict to transversions. Present-day human genomes are from a data set reported previously’. 
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Extended Data Table 5 | Counts of putative Neanderthal alleles in six modern humans 


Neanderthal allele counts Neanderthal ancestry 

Chr Sites Oase 1 Ust’-Ishim Kostenkil4 Han French Dinka Oase 1 Ust’-Ishim Kostenki 14 Han French 

1 6740 323 196 148 129 Ly 25 6.70% 3.84% 2.77% 2.34% 2.07% 

2 7112 294 145 121 188 199 29 5.65% 2.47% 96% 3.39% 3.62% 

3 5417 177 102 96 74 98 28 4.17% 2.07% 90% 1.29% 96% 

4 4495 359 86 63 141 96 42 10.69% 1.48% 0.71% 3.34% 82% 

5 4330 446 108 66 103 95 23 14.80% 2.97% 50% 2.80% 2.52% 

6 4549 324 155 167 142 138 73 8.36% 2.73% 3.13% 2.30% 2.16% 

7 4422 147 68 65 102 72 34 3.87% 1.16% 06% 2.33% 30% 

8 4322 131 132 72 35 38 14 4.10% 4.14% 2.03% 0.74% 0.84% 

9 3107 500 69 120 118 49 15 23.65% 2.63% 5.12% 5.02% 66% 

10 4009 147 139 67 131 86 22 4.72% 4.42% .70% 4.12% 2.42% 

11 4193 153 93 88 81 73 26 4.59% 2.42% 2.24% 1.99% .70% 

12 3456 456 160 54 125 93 10 19.55% 6.58% 93% 5.04% 3.64% 

13 2457 96 81 33 54 30 18 4.81% 3.89% 0.93% 2.22% 0.74% 

14 2390 85 aa 52 50 32 13 4.56% 0.89% 2.47% 2.35% 2.47% 

15 2327 73 78 47 38 32 5 4.43% 4.75% 2.73% 2.15% .76% 

16 3139 90 121 68 43 39 8 3.96% 5.45% 2.90% 1.69% 50% 

17 2543 72 89 37 85 75 56 0.95% 1.97% -1.13% 1.73% 13% 

18 2305 57 58 59 27 29 5 3.42% 3.48% 3.55% 1.45% 58% 

19 1769 79 49 33 43 35 12 5.74% 3.17% 1.80% 2.66% 9T% 

20 2492 107 29 62 56 43 12 5.78% 1.03% 3.04% 2.68% 88% 

21 1026 36 53 22 8 11 10 3.84% 6.35% 1.77% -0.30% 0.15% 

22 1455 79 33 66 34 18 2 7.11% 2.92% 6.35% 3.02% 35% 

All 78055 4231 2071 1606 1807 1518 485 7.27% 3.08% 2.18% 2.57% «70/9 

Subtract Dinka 3746 1586 1121 1322 1033 0 
The analysis is based on 78,055 sites covered by at least one deaminated fragment in Oase 1. To convert the counts to estimates of ancestry, we subtract the Dinka countas an estimate of the false positive rate an 
divide by the number of sites covered (as indicated for the whole genome on the bottom). This gives the rate of alleles per screened site on this chromosome for this individual. We then multiply this quantity by 2%/ 

.32% to recalibrate the 1.32% seen genome-wide in the French to an assumed 2% genome-wide Neanderthal ancestry in the French’. 
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Extended Data Table 6 | Ancient DNA libraries made from the Oase 1 mandible 
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Metainformation Sequencing results All fragments Deaminated fragments 
Extra- 
Lib Ex UDG ct Sequences Sequences After Cov- % % Cov- % % 
treat- Index 1 Index 2 goinginto >35bp dup. er- C—>T CT | er- C-T CHT 
rary tract used . 
ment (@ alignment mapped removal | age S’end 3’ end | age Send 3’ end 
A5227 E1406 Yes ACTIGCG =AACTCCG 206,982 118,976 34,486 112 8 19 5 19 36 
A5252 E1843 —s- Yes GTAAGCC = TIGAAGT = 40, 74,384 46,394 31,368 114 7 25 5 18 55 
A9032 E1406 No ATAACGT = ACTATCA 6 9,321,903 5,904,210 51,810 178 20 21 12 31 39 
A9033. E1406 No AATAGGA = ACCAACT 66 7,932,271 4,816,314 55,878 193 21 20 13 36 38 
A9034 E1406 No ATCACGA = AACTCCG 6 10,422,467 6,861,634 59,883 207 20 20 14 35 38 
27,958,007 17,747,528 233,425 | 803 17 21 49 30 39 
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Extended Data Table 7 | Sequencing metrics on the five libraries for the four capture probe panels 


Fragments Fragments Fragments on target % SNPs Average 
going into mapped to after dup. removal hit at least coverage 
alignment genome and MAPQ37 filter once on SNPs 


No. target 


Library Panel SNPs 


393,577 10,849,144 2,235,955 33,564 26.5% 0.34 
393,577 17,159,085 2,808,704 73,824 15.9% 
393,577 16,902,935 3,256,438 42,520 27.7% 
393,577 63,441,719 22,124,247 95,161 36.0% 
393,577 60,181,844 14,278,978 80,626 33.3% 
393,577 168,534,727 44,704,322 724,653 73.0% 
842,630 25,105,625 3,801,435 78,015 
842,630 29,196,969 4,655,434 83,093 
842,630 35,780,652 5,968,851 200,767 
842,630 28,209,496 4,276,439 52,411 
842,630 20,286,540 1,630,343 06,943 
842,630 138,579,282 20,332,502 818,648 
997,780 26,088,835 2,964,094 59,162 
997,780 26,641,358 4,490,372 58,614 


997,780 28,795,043 4,985,140 54,177 
997,780 25,848,311 4,395,413 71,537 
997,780 25,691,323 2,254,636 53,932 
997,780 133,064,870 19,089,655 596,107 
Archaic 5749,385 19,329,832 2,086,208 205,095 
Archaic 5749,385 24,629,023 2,768,355 237,818 
Archaic 5749,385 31,200,466 3,783,805 257,351 
Archaic 5749,385 27,659,125 3,606,375 195,356 


Archaic 5749,385 31,472,143 2,435,080 136,637 


Archaic 5749,385 134,290,589 14,679,823 1,022,046 
Combined ‘ 81,373,436 11,087,692 719,146 
Combined : 97,626,435 14,722,865 698,890 
Combined 112,679,096 17,994,234 806,589 
Combined 145,158,651 34,402,474 666,195 


bined 137,631,850 20,599,037 531,873 


bined 574,469,468 98,806,302 3,406,685 
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Extended Data Table 8 | Effect of filtering on amount of nuclear data available 


All fragments Deaminated fragments only 
pase Target No.SNPs % SNPs Average | No.SNPs % SNPs Average 
SNPs hit >1x hit >1x coverage | hit >1x hit >1x coverage 
Panels 1-3 2,051,902 | 1,038,619 50.6% 1.03 271,326 13.2% 0.16 
Panel 4 subset* 954,849 361,681 37.9% 0.69 87,803 9.2% 0.11 
Panels 1-4 3,801,245 | 1,685,891 444% 0.85 426,027 11.2% 0.13 


Note that numbers differ from Extended Data Table 7 because only sites with base quality =20 were used. 
*The Panel 4 subset excludes the sites where only the Denisovan genome differs from the African panel. 
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Extended Data Table 9 


For the 25 present-day humans, individuals ending with a subscript ‘A’ are from ‘Panel A’ reported in ref. 9 and individuals with a subscript ‘B’ are from ‘Panel B’ reported in ref. 7. Unless otherwise specified, we 


used Panel B individuals. 


Sample ID 
Oasel 
Vindiya 
Mezmaiskaya 
Altai 
Denisova 
Kostenkil 4 
MAI 
LaBrana 
Loschbour 
Stuttgart 
Ust’-Ishim 
Dinka, 
French, 
Papuan, 
Sardinian, 
Han, 
Yoruba, 
Karitiana, 
San, 
Mandenka, 
Dai, 
Mbuti, 
Daiz 
French 
Hang 
Mandenkag 
Mbutig 
Papuang 
Sang 
Sardiniang 
Yorubag 
Karitianap 
Mixegz 
Australian 
Australian po 
Dinkag 


Human 
Modern 
Archaic 
Archaic 
Archaic 
Archaic 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
Modern 
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Genomes merged with the Oase 1 data 


Data type 
Low coverage 
Low coverage 
Low coverage 
High coverage 
High coverage 
Low coverage 
Low coverage 
Low coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 
High coverage 


Mean 
Capture 
1.3 
0.5 
52 
31 
2.4 
1 
3.4 
22 
19 
42 
28 
27 
26 
25 
28 
32 
26 
33 
25 
28 
24 
37 
42 
35 
37 
37 
42 
38 
38 
39 
35 
42 
42 
37 
35 


UDG-treated 
Mix of library types 
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The octopus genome and the evolution of cephalopod 
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Coleoid cephalopods (octopus, squid and cuttlefish) are active, 
resourceful predators with a rich behavioural repertoire’. They 
have the largest nervous systems among the invertebrates’ and 
present other striking morphological innovations including cam- 
era-like eyes, prehensile arms, a highly derived early embryogenesis 
and a remarkably sophisticated adaptive colouration system’. To 
investigate the molecular bases of cephalopod brain and body 
innovations, we sequenced the genome and multiple transcrip- 
tomes of the California two-spot octopus, Octopus bimaculoides. 
We found no evidence for hypothesized whole-genome duplica- 
tions in the octopus lineage*°. The core developmental and neur- 
onal gene repertoire of the octopus is broadly similar to that found 
across invertebrate bilaterians, except for massive expansions in 
two gene families previously thought to be uniquely enlarged in 
vertebrates: the protocadherins, which regulate neuronal develop- 
ment, and the C2H2 superfamily of zinc-finger transcription fac- 
tors. Extensive messenger RNA editing generates transcript and 
protein diversity in genes involved in neural excitability, as prev- 
iously described’, as well as in genes participating in a broad range 
of other cellular functions. We identified hundreds of cephalopod- 
specific genes, many of which showed elevated expression levels in 
such specialized structures as the skin, the suckers and the nervous 
system. Finally, we found evidence for large-scale genomic rear- 
rangements that are closely associated with transposable element 
expansions. Our analysis suggests that substantial expansion of a 
handful of gene families, along with extensive remodelling of gen- 
ome linkage and repetitive content, played a critical role in the 
evolution of cephalopod morphological innovations, including 
their large and complex nervous systems. 

Soft-bodied cephalopods such as the octopus (Fig. 1a) show remark- 
able morphological departures from the basic molluscan body plan, 
including dexterous arms lined with hundreds of suckers that function 
as specialized tactile and chemosensory organs, and an elaborate chro- 
matophore system under direct neural control that enables rapid 
changes in appearance’*. The octopus nervous system is vastly modi- 
fied in size and organization relative to other molluscs, comprising a 
circumesophageal brain, paired optic lobes and axial nerve cords in 
each arm’. Together these structures contain nearly half a billion 
neurons, more than six times the number in a mouse brain?”. Extant 
coleoid cephalopods show extraordinarily sophisticated behaviours 
including complex problem solving, task-dependent conditional dis- 
crimination, observational learning and spectacular displays of cam- 
ouflage’’® (Supplementary Videos 1 and 2). 

To explore the genetic features of these highly specialized animals, 
we sequenced the Octopus bimaculoides genome by a whole-genome 
shotgun approach (Supplementary Note 1) and annotated it using 
extensive transcriptome sequence from 12 tissues (Methods and 
Supplementary Note 2). The genome assembly captures more than 


97% of expressed protein-coding genes and 83% of the estimated 
2.7 gigabase (Gb) genome size (Methods and Supplementary Notes 
1-3). The unassembled fraction is dominated by high-copy repetitive 
sequences (Supplementary Note 1). Nearly 45% of the assembled gen- 
ome is composed of repetitive elements, with two bursts of transposon 
activity occurring ~25-million and ~56-million years ago (Mya) 
(Supplementary Note 4). 

We predicted 33,638 protein-coding genes (Methods and Supple- 
mentary Note 4) and found alternate splicing at 2,819 loci, but no locus 
showed an unusually high number of splice variants (Supplementary 
Note 4). A-to-G discrepancies between the assembled genome and 
transcriptome sequences provided evidence for extensive mRNA edit- 
ing by adenosine deaminases acting on RNA (ADARs). Many candid- 
ate edits are enriched in neural tissues’ and are found in a range of gene 
families, including ‘housekeeping’ genes such as the tubulins, which 
suggests that RNA edits are more widespread than previously appre- 
ciated (Extended Data Fig. 1 and Supplementary Note 5). 

Based primarily on chromosome number, several researchers pro- 
posed that whole-genome duplications were important in the evolu- 
tion of the cephalopod body plan**, paralleling the role ascribed to the 
independent whole-genome duplication events that occurred early in 
vertebrate evolution''. Although this is an attractive framework for 
both gene family expansion and increased regulatory complexity 
across multiple genes, we found no evidence for it. The gene family 
expansions present in octopus are predominantly organized in 
clusters along the genome, rather than distributed in doubly conserved 
synteny as expected for a paleopolyploid’*"* (Supplementary Note 6.2). 
Although genes that regulate development are often retained in multiple 
copies after paleopolyploidy in other lineages, they are not generally 
expanded in octopus relative to limpet, oyster and other invertebrate 
bilaterians'’’* (Table 1 and Supplementary Notes 7.4 and 8). 

Hox genes are commonly retained in multiple copies following 
whole-genome duplication’®. In O. bimaculoides, however, we found 
only a single Hox complement, consistent with the single set of Hox 
transcripts identified in the bobtail squid Euprymna scolopes with 
PCR". Remarkably, octopus Hox genes are not organized into clusters 
as in most other bilaterian genomes’*, but are completely atomized 
(Extended Data Fig. 2 and Supplementary Note 9). Although we can- 
not rule out whole-genome duplication followed by considerable gene 
loss, the extent of loss needed to support this claim would far exceed 
that which has been observed in other paleopolyploid lineages, and it is 
more plausible that chromosome number in coleoids increased by 
chromosome fragmentation. 

Mechanisms other than whole-genome duplications can drive 
genomic novelty, including expansion of existing gene families, evolu- 
tion of novel genes, modification of gene regulatory networks, and 
reorganization of the genome through transposon activity. Within 
the O. bimaculoides genome, we found evidence for all of these 
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Figure 1 | Octopus anatomy and gene family representation analysis. 

a, Schematic of Octopus bimaculoides anatomy, highlighting the tissues 
sampled for transcriptome analysis: viscera (heart, kidney and 
hepatopancreas), yellow; gonads (ova or testes), peach; retina, orange; optic 
lobe (OL), maroon; supraesophageal brain (Supra), bright pink; subesophageal 
brain (Sub), light pink; posterior salivary gland (PSG), purple; axial nerve cord 
(ANC), red; suckers, grey; skin, mottled brown; stage 15 (St15) embryo, 
aquamarine. Skin sampled for transcriptome analysis included the eyespot, 
shown in light blue. b, C2H2 and protocadherin domain-containing gene 
families are expanded in octopus. Enriched Pfam domains were identified in 


mechanisms, including expansions in several gene families, a suite 
of octopus- and cephalopod-specific genes, and extensive genome 
shuffling. 

In gene family content, domain architecture and exon-intron 
structure, the octopus genome broadly resembles that of the limpet 
Lottia gigantea’, the polychaete annelid Capitella teleta'’ and the 
cephalochordate Branchiostoma floridae’* (Supplementary Note 7 
and Extended Data Fig. 3). Relative to these invertebrate bilaterians, 
we found a fairly standard set of developmentally important trans- 
cription factors and signalling pathway genes, suggesting that the 
evolution of the cephalopod body plan did not require extreme expan- 
sions of these ‘toolkit’ genes (Table 1 and Supplementary Note 8.2). 
However, statistical analysis of protein domain distributions across 
animal genomes did identify several notable gene family expansions 
in octopus, including protocadherins, C2H2 zinc-finger proteins 
(C2H2 ZNFs), interleukin-17-like genes (IL17-like), G-protein- 
coupled receptors (GPCRs), chitinases and sialins (Figs 1b, 2 and 3; 
Extended Data Figs 4-6 and Supplementary Notes 8 and 10). 

The octopus genome encodes 168 multi-exonic protocadherin 
genes, nearly three-quarters of which are found in tandem clusters 
on the genome (Fig. 2b), a striking expansion relative to the 17-25 
genes found in Lottia, Crassostrea gigas (oyster) and Capitella gen- 
omes. Protocadherins are homophilic cell adhesion molecules whose 
function has been primarily studied in mammals, where they are 
required for neuronal development and survival, as well as synaptic 
specificity’®. Single protocadherin genes are found in the invertebrate 
deuterostomes Saccoglossus kowalevskii (acorn worm) and Strongylo- 
centrotus purpuratus (sea urchin), indicating that their absence in 
Drosophila melanogaster and Caenorhabditis elegans is due to gene 
loss. Vertebrates also show a remarkable expansion of the protocad- 
herin repertoire, which is generated by complex splicing from a clus- 
tered locus rather than tandem gene duplication (reviewed in ref. 19). 
Thus both octopuses and vertebrates have independently evolved a 
diverse array of protocadherin genes. 

A search of available transcriptome data from the longfin inshore 
squid Doryteuthis (formerly, Loligo) pealeii*® also demonstrated an 
expanded number of protocadherin genes (Supplementary Note 
8.3). Surprisingly, our phylogenetic analyses suggest that the squid 
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lophotrochozoans (green) and molluscs (yellow), including O. bimaculoides 
(light blue). For a domain to be labelled as expanded in a group, at least 50% of 
its associated gene families need a corrected P value of 0.01 against the outgroup 
average. Some Pfams (for example, Cadherin and Cadherin_2) may occur 
in the same gene, however multiple domains in a given gene were counted 
only once. Bfl, Branchiostoma floridae; Cel, Caenorhabditis elegans; Cgi, 
Crassostrea gigas; Cte, Capitella teleta; Dme, Drosophila melanogaster; Dre, 
Danio rerio; Gga, Gallus gallus; Hsa, Homo sapiens; Hro, Helobdella robusta; 
Lch, Latimeria chalumnae; Lgi, Lottia gigantea; Mmu, Mus musculus; Obi, O. 
bimaculoides; Pfu, Pinctada fucata; Xtr, Xenopus tropicalis. 


and octopus protocadherin arrays arose independently. Unlinked 
octopus protocadherins appear to have expanded ~135 Mya, after 
octopuses diverged from squid. In contrast, clustered octopus proto- 
cadherins are much more similar in sequence, either due to more 
recent duplications or gene conversion as found in clustered proto- 
cadherins in zebrafish and mammals”. 

The expression of protocadherins in octopus neural tissues (Fig. 2) is 
consistent with a central role for these genes in establishing and main- 
taining cephalopod nervous system organization as they do in verte- 
brates. Protocadherin diversity provides a mechanism for regulating 
the short-range interactions needed for the assembly of local neural 
circuits’*, which is where the greatest complexity in the cephalopod 
nervous system appears*. The importance of local neuropil interac- 
tions, rather than long-range connections, is probably due to the limits 
placed on axon density and connectivity by the absence of myelin, as 
thick axons are then required for rapid high-fidelity signal conduction 
over long distances. The sequence divergence between octopus and 


Table 1 | Metazoan developmental control genes 


| I 1 

Obi Lgi Cte Dme Cel Bfl Hsa 
Ligands 
Fibroblast growth factor 3 2 1 3 3 8 22 
Wnt 12 10 12 7 5 17 19 
TGFB/BMP 12 9 14 6 5 22 33 
Delta/Jagged 4 ll 1 2 4 2 7 
Hedgehog 1 il 1 1 0 1 3 
Axon guidance 10 9 9 6 8 23 33 
Transcription factors 
C2H2 zinc-finger 1,790 413 222 326 211 1,338 764 
Homeodomain 114 #121 #111 = «104 99 133 333 
High mobility group 23 15 14 13 16 51 125 
Helix loop helix 50 63 64 59 42 78 118 
Nuclear hormone receptor 40 44 45 16 274 33 48 
Fox 16 28 26 17 18 42 43 
Tbox 9 9 f- 8 21 9 18 


Number of members of developmental ligand and transcription factor families from O. bimaculoides 
and selected other taxa. Dendrogram above species names reflects their evolutionary relationships. Bfl, 
Branchiostoma floridae; Cel, Caenorhabditis elegans; Cte, Capitella teleta; Dme, Drosophila melanogaster; 
Hsa, Homo sapiens; Lgi, Lottia gigantea; Obi, O. bimaculoides. 
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Figure 2 | Protocadherin expansion in octopus. a, For a larger version of 
panel a, see Extended Data Fig. 11. Phylogenetic tree of cadherin genes in Hsa 
(red), Dme (orange), Nematostella vectensis (mustard yellow), Amphimedon 
queenslandica (yellow), Cte (green), Lgi (teal), Obi (blue), and Saccoglossus 
kowalevskii (purple). 1, Type I classical cadherins; II, calsyntenins; III, octopus 
protocadherin expansion (168 genes); IV, human protocadherin expansion (58 
genes); V, dachsous; VI, fat-like; VII, fat; VIII, CELSR; IX, Type II classical 
cadherins. Asterisk denotes a novel cadherin with over 80 extracellular 
cadherin domains found in Obi and Cte. b, Scaffold 30672 and Scaffold 9600 
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contain the two largest clusters of protocadherins, with 31 and 17, respectively. 
Clustered protocadherins vary greatly in genomic span and are oriented in a 
head-to-tail manner along each scaffold. c, Expression profiles of 161 
protocadherins and 19 cadherins in 12 octopus tissues; 7 protocadherins were 
not detected in the tissues sampled. Cells are coloured according to number of 
standard deviations from the mean expression level. Protocadherins have high 
expression in neural tissues. Cadherins generally show a similar expression 
pattern, with the exception of a group of sucker-specific cadherins. 


Figure 3 | C2H2 ZNF expansion in octopus. 

a, Genomic organization of the largest C2H2 
cluster. Scaffold 19852 contains 58 C2H2 genes 
that are transcribed in different directions. 

b, Expression profile of C2H2 genes along Scaffold 
19852 in 12 octopus transcriptomes. Neural and 
developmental transcriptomes show high levels of 
expression for a majority of these C2H2 genes. 

In a and b, arrow denotes scaffold orientation. 

c, Distribution of fourfold synonymous site 
transversion distances (4DTv) between C2H2- 
domain-containing genes. 
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squid protocadherin expansions may reflect the notable differences 
between octopuses and decapodiforms in brain organization, which 
have been most clearly demonstrated for the vertical lobe, a key struc- 
ture in cephalopod learning and memory circuits”. Finally, the inde- 
pendent expansions and nervous system enrichment of protocadherins 
in coleoid cephalopods and vertebrates offers a striking example of 
convergent evolution between these clades at the molecular level. 

As with the protocadherins, we found multiple clusters of C2H2 
ZNF transcription factor genes (Fig. 3a and Supplementary Note 8.4). 
The octopus genome contains nearly 1,800 multi-exonic C2H2- 
containing genes (Table 1), more than the 200-400 C2H2 ZNFs found 
in other lophotrochozoans and the 500-700 found in eutherian 
mammals, in which they form the second-largest gene family”. 
C2H2 ZNF transcription factors contain multiple C2H2 domains that, 
in combination, result in highly specific nucleic acid binding. The 
octopus C2H2 ZNFs typically contain 10-20 C2H2 domains but some 
have as many as 60 (Supplementary Note 8.4). The majority of the 
transcripts are expressed in embryonic and nervous tissues (Fig. 3b). 
This pattern of expression is consistent with roles for C2H2 ZNFs 
in cell fate determination, early development and transposon silencing, 
as demonstrated in genetic model systems”. 

The expansion of the O. bimaculoides C2H2 ZNFs coincides with a 
burst of transposable element activity at ~25 Mya (Fig. 3c). The flank- 
ing regions of these genes show a significant enrichment in a 70-90 base 
pair (bp) tandem repeat (31% for C2H2 genes versus 4% for all genes; 
Fisher’s exact test P value <1 X 10 1°), which parallels the linkage of 
C2H2 gene expansions to f-satellite repeats in humans”. We also 
found an expanded C2H2 ZNF repertoire in amphioxus (Table 1), 
showing a similar enrichment in satellite-like repeats. These parallels 
suggest a common mode of expansion of a highly dynamic transcrip- 
tion factor family implicated in lineage-specific innovations. 

To investigate further the evolution of gene families implicated in 
nervous system development and function, we surveyed genes assoc- 
iated with axon guidance (Table 1) and neurotransmission (Table 2), 
identifying their homologues in octopus and comparing numbers 
across a diverse set of animal genomes (Supplementary Notes 8-10). 
Several patterns emerged from this survey. The gene complements 
present in the model organisms D. melanogaster and C. elegans often 
showed striking departures from those seen in lophotrochozoans 
and vertebrates (Table 2 and Supplementary Note 10). For example, 
D. melanogaster encodes one member of the discs large (DLG) family, 
a key component of the postsynaptic scaffold. In contrast, mammals 
have four DLGs, which (along with other observations) led to sugges- 
tions that vertebrates possess uniquely complex synaptic machinery”. 
However, we found three DLGs in both octopus and limpet, suggesting 
that vertebrate and fly gene number differences are not necessarily 
diagnostic of exceptional vertebrate synaptic complexity (Supplemen- 
tary Note 10.6). 

Overall, neurotransmission gene family sizes in the octopus were 
very similar to those seen in other lophotrochozoans (Table 2 and 
Supplementary Note 10), except for a few strikingly expanded gene 
families such as the sialic acid vesicular transporters (sialins) 
(Supplementary Note 10.2). We did find variations in the sizes of 
neurotransmission gene families between human and lophotrochozo- 
ans (Table 2 and Supplementary Note 10), but no evidence for sys- 
tematic expansion of these gene families in vertebrates relative to 
octopus or other lophotrochozoans. Although some gene families were 
larger in mammals or absent in lophotrochozoans (for example, 
ligand-gated 5-HT receptors), others were absent in mammals and 
present in invertebrates (for example, anionic glutamate and acetyl- 
choline receptors). The complement of neurotransmission genes 
in octopus may be broadly typical for a lophotrochozoan, but our 
findings suggest it is also not obviously smaller than is found in mam- 
mals. 

Among the octopus complement of ligand-gated ion channels, we 
identified a set of atypical nicotinic acetylcholine receptor-like genes, 


LETTER 


most of which are tandemly arrayed in clusters (Extended Data Fig. 7). 
These subunits lack several residues identified as necessary for the 
binding of acetylcholine”, so it is unlikely that they function as acetyl- 
choline receptors. The high level of expression of these divergent sub- 
units within the suckers raises the interesting possibility that they act as 
sensory receptors, as do some divergent glutamate receptors in other 
protostomes”. In addition, we identified 74 Aplysia-like and 11 verte- 
brate-like candidate chemoreceptors among the octopus GPCR super- 
family of ~330 genes (Extended Data Fig. 6). 

We found, amid extensive transcription of octopus transposons, 
that a class of octopus-specific short interspersed nuclear element 
sequences (SINEs) is highly expressed in neural tissues (Supplemen- 
tary Note 4 and Extended Data Fig. 8). Although the role of active 
transposons is unclear, elevated transposon expression in neural 
tissues has been suggested to serve an important function in learning 
and memory in mammals and flies”. 

Transposable element insertions are often associated with genomic 
rearrangements” and we found that the transposon-rich octopus gen- 
ome displays substantial loss of ancestral bilaterian linkages that are 
conserved in other species (Supplementary Note 6 and Extended Data 
Fig. 9). Interestingly, genes that are linked in other bilaterians but not 
in octopus are enriched in neighbouring SINE content. SINE inser- 
tions around these genes date to the time of tandem C2H2 expansion 
(Extended Data Fig. 9d), pointing to a crucial period of genome evolu- 
tion in octopus. Other transposons such as Mariner show no such 
enrichment, suggesting distinct roles for different classes of transpo- 
sons in shaping genome structure (Extended Data Fig. 9c). 

Transposable element activity has been implicated in the modifica- 
tion of gene regulation across several eukaryotic lineages”. We found 
that in the nervous system, the degree to which a gene’s expression is 
tissue-specific is positively correlated with the transposon load around 
that gene (7° values ranging from 0.49 in the optic lobe to 0.81 in 
the subesophageal brain; Extended Data Fig. 8 and Supplementary 
Note 4). This correlation may reflect modulation of gene expression 
by transposon-derived enhancers or a greater tolerance for transposon 
insertion near genes with less complex patterns of tissue-specific gene 
regulation. 

Using a relaxed molecular clock, we estimate that the octopus and 
squid lineages diverged ~270 Mya, emphasizing the deep evolutionary 
history of coleoid cephalopods**° (Supplementary Note 7.1 and 
Extended Data Fig. 10a). Our analyses found hundreds of coleoid- 
and octopus-specific genes, many of which were expressed in tissues 
containing novel structures, including the chromatophore-laden skin, 
the suckers and the nervous system (Extended Data Fig. 10 and 
Supplementary Note 11). Taken together, these novel genes, the 


Table 2 | lon channel subunits 


Obi Aca Lgi Cte Dme Cel Hsa 
Voltage-gated calcium channels 8 8 610 9 10 10 
Voltage-gated sodium channels 3 2 3 2. 4 0 13 
Transient receptor potential channels 36 45 40 43 13 23 29 
K* channels 
Voltage-gated 30 23 29 20 10 51 40 
Calcium-activated, small/large conductance 12 8 9 6 3 6 8 
Inward rectifying 3 4 5 6 4 3 16 
Two pore 12 9 12 14 11 #47 «#15 
Non-voltage-gated 27 21 26 26 18 72 39 
Cys-loop receptors 
Glutamate 21 15 47 36 30 15 18 
Nicotinic acetylcholine 53 16 52 77 10 88 16 
Inhibitory acetylcholine 3 2 5 2 0 4 0 
5-HT3 0 0 0 0 0 1 5 
GABA 6 5 4 9 3 7 19 
Glutamate-gated chloride channels 7 5&6 8 5 1 6 


Number of subunits of representative ion channel families in O. bimaculoides and across examined taxa. 
Dendrogram above species names shows their evolutionary relationships. Aca, Aplysia californica. 
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expansion of C2H2 ZNFs, genome rearrangements, and extensive 
transposable element activity yield a new landscape for both trans- 
and cis-regulatory elements in the octopus genome, resulting in 
changes in an otherwise ‘typical’ lophotrochozoan gene complement 
that contributed to the evolution of cephalopod neural complexity and 
morphological innovations. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Data access. Genome and transcriptome sequence reads are deposited in the SRA 
as BioProjects PRJNA270931 and PRJNA285380. The genome assembly and 
annotation are linked to the same BioProject ID. A browser of this genome assem- 
bly is available at (http://octopus.metazome.net/). 

Genome sequencing and assembly. Genomic DNA from a single male Octopus 
bimaculoides*' was isolated and sequenced using Illumina technology to 60-fold 
redundant coverage in libraries spanning a range of pairs from ~350bp to 10 
kilobases (kb). These data were assembled with meraculous” achieving a contig 
N50-length of 5.4kb and a scaffold N50-length of 470 kb. The longest scaffold 
contains 99 genes and half of all predicted genes are on scaffolds with 8 or more 
genes (Supplementary Note 1). 

Genome size and heterozygosity. The O. bimaculoides haploid genome size was 
estimated to be ~2.7 gigabases (Gb) based on fluorescence (2.66-2.68 Gb) and 
k-mer (2.86 Gb) measurements (Supplementary Notes 1 and 2), making it several 
times larger than other sequenced molluscan and lophotrochozoan genomes’”. 
We observed nucleotide-level heterozygosity within the sequenced genome to be 
0.08%, which may reflect a small effective population size relative to broadcast- 
spawning marine invertebrates. 

Transcriptome sequencing. Twelve transcriptomes were sequenced from RNA 
isolated from ova, testes, viscera, posterior salivary gland (PSG), suckers, skin, 
developmental stage 15 (St15)**, retina, optic lobe (OL), supraesophageal brain 
(Supra), subesophageal brain (Sub), and axial nerve cord (ANC) (Supplementary 
Note 2). RNA was isolated using TRIzol (Invitrogen) and 100-bp paired-end reads 
(insert size 300 bp) were generated on an Illumina HiSeq2000 sequencing machine. 
De novo transcriptome assembly. Adapters and low-quality reads were removed 
before assembling transcriptomes using the Trinity de novo assembly package 
(version r2013-02-25 (refs 34, 35)). Assembly statistics are summarized in 
Supplementary Table 2.2. Following assembly, peptide-coding regions were trans- 
lated using TransDecoder in the Trinity package. We compared the de novo 
assembled RNA-seq output to the genome to evaluate the completeness of the 
genome assembly. To minimize the number of spuriously assembled transcripts, 
only transcripts with ORFs predicted by TransDecoder were mapped onto the 
genome with BLASTN. Only 1,130 out of 48,259 transcripts with ORFs (2.34%) 
did not have a match in the genome with a minimum identity of 95%. 
Annotation of transposable elements. Transposable elements were identified 
with RepeatScout and RepeatModeler*’, and the masking was done with 
RepeatMasker”’, as outlined in Supplementary Note 4.2. The most abundant 
transposable element is a previously identified octopus-specific SINE* that 
accounts for 4% of the assembled genome. 

Annotation of protein-coding genes. Protein-coding genes were annotated by 
combining transcriptome evidence with homology-based and de novo gene pre- 
diction methods (Supplementary Note 4). For homology prediction we used pre- 
dicted peptide sets of three previously sequenced molluscs (L. gigantea, C. gigas, 
and A. californica) along with selected other metazoans. Alternative splice iso- 
forms were identified with PASA*’. Annotation statistics are provided in 
Supplementary Table 4.1.1. Genes known in vertebrates to have many isoforms, 
such as ankyrin, TRAK1 and LRCH1, also show alternative splicing in octopus but 
at a more limited level. Octopus genes with ten or more alternative splice forms are 
provided in Supplementary Table 4.1.2. 

Calibration of sequence divergence with respect to time. The divergence 
between squid and octopus was estimated using r8s"° by fixing cephalopod diver- 
gence from bivalves and gastropods to 540 Mya®*. Our estimate of 270 Mya for 
the squid-octopus divergence corresponds to mean neutral substitution rate of dS 
~2 based on the protein-directed CDS alignments between the species (Supple- 
mentary Fig. 6.1.2) anda dS estimation using the yn00 program’. Throughout the 
manuscript we convert from sequence divergence to time by assuming that dS 
~1 corresponds to 135 million years. For example, unlinked octopus protocad- 
herins appear to have expanded ~135 Mya based on mean pairwise dS ~1, after 
octopuses diverged from squid. In contrast, clustered octopus protocadherins are 
much more similar in sequence (mean pairwise dS ~0.4, or ~55 Mya). 
Quantifying gene expression. Transcriptome reads were mapped to the genome 
assembly with TopHat 2.0.11 (ref. 42). A range of 76-90% of reads from the 
different samples mapped to the genome. Mapped reads were sorted and indexed 
with SAMtools**. The read counts in each tissue were produced with BEDTools 
multicov program“ using the gene model coordinates. The counts were normal- 
ized by the total transcriptome size of each tissue and by the length of the gene. 
Heat maps showing expression patterns were generated in R using the heatmap.2 
function. 

Gene complement. Gene families of particular interest, including developmental 
regulatory genes, neural-related genes, and gene families that appear to be 
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expanded in O. bimaculoides, were manually curated and analysed. We searched 
the octopus genome and transcriptome assemblies using BLASTP and TBLASTN 
with annotated sequences from human, mouse and D. melanogaster. Bulk analyses 
were also performed using Pfam* and PANTHER“. We used BLASTP and 
TBLASTX to search for specific gene families in deposited genome and transcrip- 
tome databases for L. gigantea, A. californica, C. gigas, C. teleta, T. castaneum, 
D. melanogaster, C. elegans, N. vectensis, A. queenslandica, S. kowalevskii, B. floridae, 
C. intestinalis, D. rerio, M. musculus and H. sapiens. Candidate genes were verified 
with BLAST” and Pfam* analysis. Genes identified in the octopus genome were 
confirmed and extended using the transcriptomes. Multiple gene models that 
matched the same transcript were combined. The identified sequences from octo- 
pus and other bilaterians were aligned with either MUSCLE*, CLUSTALO”, 
MacVector 12.6 (MacVector, North Carolina), or Jalview’. Phylogenetic trees 
were constructed with FastTree*! using the Jones-Taylor-Thornton model of 
amino acid evolution, and visualized with FigTree v1.3.1. 

Synteny. Microsynteny was computed based on metazoan node gene families 
(Supplementary Note 7). We used Nmax 10 (maximum of 10 intervening genes) 
and Nmin 3 (minimum of three genes in a syntenic block) according to the 
pipeline described in ref. 17 (Supplementary Note 6). To simplify gene family 
assignments we limited our analyses to 4,033 gene families shared among human, 
amphioxus, Capitella, Helobdella, Octopus, Lottia, Crassostrea, Drosophila and 
Nematostella. We required ancestral bilaterian syntenic blocks to have a minimum 
of one species present in both ingroups, or in one ingroup and one outgroup. To 
examine the effect of fragmented genome assemblies, we simulated shorter assem- 
blies by artificially fragmenting genomes to contain on average 5 genes per scaffold 
(Supplementary Note 6). 

In comparison with other bilaterian genomes, we find that the octopus genome 
is substantially rearranged. In looking at microsyntenic linkages of genes with a 
maximum of 10 intervening genes, we found that octopus conserves only 34 out of 
198 ancestral bilaterian microsyntenic blocks; the limpet Lottia and amphioxus 
retain more than twice as many such linkages (96 and 140, respectively). This 
difference remains significant after accounting for genes missed through orthology 
assignment as well as simulations of shorter scaffold sizes (Supplementary Note 6; 
Extended Data Fig. 9b). Scans for intra-genomic synteny, and doubly conserved 
synteny with Lottia, were performed as described in Supplementary Note 6. 
Transposable elements and synteny dynamics. The 5 kb upstream and down- 
stream regions of genes were surveyed for transposable element (TE) content. For 
genes with non-zero TE load, their assignment to either conserved or lost bilater- 
ian synteny in octopus was done using the microsynteny calculation described 
above. The number of genes for each category and TE class were as follows: 484 
genes for retained synteny and 1,193 genes in lost synteny for all TE classes; 440 
and 1,107, respectively, for SINEs; and 116 and 290, respectively, for Mariner. 
Wilcoxon U-tests for the difference of TE load in linked versus non-linked genes 
were conducted in R. 

To assess transposon activity we assigned transcriptome reads aligned to 
5,496,558 annotated transposon loci using BEDTools“. Of these, 2,685,265 loci 
showed expression in at least one of the tissues. 

RNA editing. RNA-seq reads were mapped to the genome with TopHat”, and 
SAMtools* was used to identify SNPs between the genomic and the RNA 
sequences. To identify polymorphic positions in the genome, SNPs and indels 
were predicted using GATK HaplotypeCaller version 3.1-1 in discovery mode 
with a minimum Phred scaled probability score of 30, based on an alignment of 
the 350bp and 500bp genomic fragment libraries using BWA-MEM version 
0.7.6a. Using BEDTools™, we removed SNPs predicted in both the transcriptome 
and the genome and discarded SNPs that had a Phred score below 40 or 
were outside of predicted genes. SNPs were binned according to the type of 
nucleotide change and the direction of transcription. Candidate edited genes were 
taken as those having SNPs with A-to-G substitutions in the predicted 
mRNA transcripts. 

Cephalopod-specific genes. Cephalopod novelties were obtained by BLASTP 
and TBLASTN searches against the whole NR database and a custom database 
of several mollusc transcriptomes (Supplementary Note 11.1). To ensure that we 
had as close to full-length sequence as possible, we extended proteins predicted 
from octopus genomic sequence with our de novo assembled transcriptomes, 
using the longest match to query NR, transcriptome and EST sequences from 
other animals. Gene sequences with transcriptome support but without a match 
to non-cephalopod animals at an e-value cutoff of 1 x 10~* were considered for 
further analysis. Octopus sequences with a match of 1 X 10° or better to a 
sequence from another cephalopod were used to construct gene families, which 
were characterized by their BLAST alignments, HMM, PFAM-A/B, and 
UNIREF90 hits. The cephalopod-specific gene families are listed in the Source 
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Data file for Extended Data Fig. 10. Octopus-specific novelties were defined as 
sequences with transcriptome support but without any matches to sequences 
from any other animals (<1 X 10°), including nautiloid and decapodiform 
cephalopods. 
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Extended Data Figure 1 | RNA editing in octopus. a, Approximate 
maximum likelihood tree of adenosine deaminases acting on RNA (ADARs) in 
bilaterians. ADARI, ADAR2, ADAR-like/ADAD, and ADAT (tRNA-specific 
adenosine deaminase) were identified in Hsa, Mmu, Cin, Dme, Cte, Lgi, 

D. opalescens (Dop™), and Obi with Shimodaira—Hasegawa-like support 
indicated at the nodes. b, O. bimaculoides ADAR1, ADAR2 and ADAR-like 
proteins contain one or two double-stranded RNA binding domains (dsRBD) 
as well as an adenosine deaminase domain. ADARI also has a z-alpha 
domain. ¢, Expression profiles of the three ADAR genes found in 12 

O. bimaculoides tissues by RNA-seq profiling. d, DNA-RNA differences in O. 
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bimaculoides show prominent A-to-G changes. Histogram illustrates the 
number of DNA-RNA differences detected between coding sequences in 
the genome and 12 O. bimaculoides transcriptomes after filtering out 
polymorphisms identified in genomic sequencing. Differences were binned 
by the type of change (see key) in the direction of transcription. A-to-G 
changes are the most prevalent, particularly in neural tissues and during 
development, paralleling the expression of octopus ADARs in c. Other types 
of changes were also detected at lower levels, possibly resulting from 
uncharacterized polymorphisms. 
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Extended Data Figure 2 | Local arrangement of Hox gene complement in 
O. bimaculoides and selected bilaterians. At the top, the four compact Hox 
clusters of H. sapiens and the single B. floridae cluster are depicted. The 

D. melanogaster Hox complex is split into two clusters. We included genes in 
the D. melanogaster locus that are homologues of Hox genes but have lost their 
homeotic function, such as fushi tarazu (ftz), bicoid, zen and zen2 (the latter 
three are represented as overlapping boxes). Hox genes in C. teleta are found 


on three scaffolds’’. L. gigantea has a single cluster with the full known 
lophotrochozoan gene complement. In O. bimaculoides many of the scaffolds 
are several hundred kb long, and no two Hox genes are on the same scaffold. 
The positions of O. bimaculoides genes approximate their locations on 
scaffolds. Dashed lines indicate that the scaffold continues beyond what is 
shown. Scaffold length is depicted to scale with size noted on the left. Genes are 
positioned to illustrate orthology, which is also highlighted by colour. 
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Extended Data Figure 3 | Gene complement and gene architecture 


evolution in metazoans. a, Principal component analysis of gene family 
counts. O. bimaculoides highlighted in green. Deuterostomes are indicated in 


blue, ecdysozoans in red, lophotrochozoans in green, and sponges and 


cnidarians in orange. Xtr, Xenopus tropicalis; Gga, Gallus gallus; Tca, Tribolium 
castaneum; Dpu, Daphnia pulex; Isc, Ixodes scapularis; Ava, Adineta vaga; 


Dme 
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Nve gy og-4 
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Spu, S. purpuratus; Hma, Hydra magnipapillata; Adi, Acropora digitifera. 
For methods, see Supplementary Note 7.4. b-d, MrBayes” tree (constrained 
topology) on binary characters of presence or absence of Pfam domain 
architectures (b), introns (c), or indels (d); scale bar represents estimated 
changes per site. For methods, see Supplementary Note 7.3. 
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Extended Data Figure 4 | Protocadherin genes within a genomic cluster are Scaffold 9600. Almost all of these protocadherins are most highly expressed in 
similar in sequence and sites of expression. a, Expression profile of the 31 _ nervous tissues, with the exception of Ocbimv220039316m, which is most 
protocadherin genes located on Scaffold 30672 in 12 octopus transcriptomes. _ highly expressed in the St15 sample. d, Phylogenetic tree highlighting Scaffold 


Over three-quarters of the protocadherins are highly expressed throughout 9600 protocadherins in grey bars. As seen in b, protocadherins of the same 
central brain, OL and ANC, while the others show more mixed distributions. _ scaffold tend to cluster together on the tree. Order of the genes in the heat maps 
b, Phylogenetic tree highlighting Scaffold 30672 protocadherins in grey (a, c) follows the ordering on the corresponding scaffold. 


bars. c, Expression profile of the 17 protocadherin genes located on 


©2015 Macmillan Publishers Limited. All rights reserved 


Mmu_AAX90603-1 
Hsa_AAA59134-1 
p Mmu_eDl26238-1 
Hsa_AAA74137-1 
Lp Mmu_Aalto554-4 
Hsa_AAH47698-1 
Mmu_NM_145837 
T Hsa_AAHS6243-1 
r Mmu_EDLO9761-1 
Hsa_AAF28104-1 
1 y Mmu_AAK59816_1 
Hsa_AAG40848-1 


Hsa_AAF28105_1 
— Mmu_EDL14378-1 
Hsa_AAH67505-1 
Lp Mmu_AAQ88439-1 
Hsa_AAH70124-1 
r cai KJ531893_1 
Cgi_KJ531897_1 
Lgi_172928 
Cgi_KJ531894 
Cgi_KJ531895 
Lgi_152638 
Lgi_152641 
Lgi_152639 
Lgi_176347 
Lgi_228210 
Cte_199819 
Cte_207036 
Cgi_KJ531896 


Cte_209751 
Cte_226557 
Cte_209750 
Cte_209765 


Cte_210775 
Cgi_ABO93467-1 
olL17L_E183 
olL17L_A908 
olL17L_A910 
olL17L_A907 
olL17L_B404 


olL17L_A906 

T oiL17L_A904 

olL17L_D104 

ict 7L_a928 

olL17L_A927 
olL17L_A911 
-— olL17L_C805 
olL17L_C804 
1p olL17L_B406 
olL17L_B405, 
olL17L_A903 
olL17L_A905 
Lp olL17L_asoz 
olL17L_B403 
olL17L_A929 
-— olL17L_A922 
olL17L_A924 
fe olL17L_A925 
olL17L_A923 


olL17L_A921 
olL17L_A912 


—p oll t7L_asis 
olL17L_A919 
olL17L_A918 
al olL17L_A914 
olL17L_A920 


Octopus 
IL17-like 


Extended Data Figure 5 | Expansion of interleukin 17 (IL17)-like genes. 

a, Phylogenetic tree of interleukin genes in Obi, Cte, Cgi, and Lgi. Mammalian 
ILIA, IL1B, and IL7 used as outgroups. Human and mouse IL17s branch 
from other members of the IL family. Octopus ILs (as well as all identified 
invertebrate Ls) group with the mammalian IL17 branch and are named 
‘IL17-like’. The 31 octopus genes are distributed across 5 scaffolds: scaffold A 
(Obi_A), 23 members; scaffold B (Obi_B), 4 members; scaffold C (Obi_C), 2 
members; scaffolds D (Obi_D) and E (Obi_E), 1 member each. b, Expression 
profile of 31 octopus IL17-like genes. Heat map rows are arranged by order 
on each scaffold. Blank rows indicate genes not expressed in our 
transcriptomes. The 27 genes found in our transcriptomes have strong 
expression in the suckers and skin. The scaffold C genes are enriched in the PSG 
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and the Scaffold D gene is enriched in the viscera. c, Conserved cysteine 
residues in human IL17 and invertebrate IL17-like proteins. The human IL17 
proteins share a conserved cysteine motif comprising 4 cysteine residues, which 
may form interchain disulfide bonds and facilitate dimerization®®. Octopus 
IL17-like proteins also contain this four-cysteine motif, highlighted in yellow. 
One octopus sequence encodes only 3 of these highly conserved cysteine 
residues. These four cysteines are also present to varying degrees in Lottia, 
Capitella and Crassostrea sequences. Two additional conserved cysteine 
residues were found in the octopus sequences and are highlighted in red. The 
first cysteine residue is found in all invertebrate sequences examined, and none 
of the mammalian IL17 sequences. 
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Extended Data Figure 6 | G-protein-coupled receptors. GPCRs, also 
known as 7-transmembrane (7TM) or serpentine receptors, form a large 
superfamily that activates intracellular second messenger systems upon ligand 
binding. This figure considers a subset of the 329 GPCRs we identified in 

O. bimaculoides. The full complement of GPCRs is presented in Supple- 
mentary Note 8.5. a, b, As reported for other lophotrochozoan genomes, the 
octopus genome contains chemosensory-like GPCRs; 74 GPCRs are similar to 
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the Aplysia chemosensory GPCRs” and 11 GPCRs are similar to vertebrate 
olfactory receptors. c, We identified 4 opsins in the octopus genome (from 
top to bottom): rhodopsin, rhabdomeric opsin, peropsin, and retinochrome. 
d, The octopus class F GPCRs comprises 6 genes: 5 Frizzled genes and 1 
Smoothened gene (*). e, Thirty octopus genes show similarity to vertebrate 
adhesion GPCRs. 
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Extended Data Figure 7 | O. bimaculoides acetylcholine receptor (AChR) 
subunits. a, Phylogenetic tree of AChR subunit genes identified in Hsa, Mmu, 
Dme, Cte, Lgi and Obi. Black asterisk indicates a Dme sequence that groups 
with alpha 1-4-like subunits despite lacking two defining cysteine residues. 

b, Expression profiles of octopus AChR subunits. Genes ordered as in the tree 
(a), starting from the grey arrow and continuing anticlockwise. Putative non- 
ACh-binding subunits are highly expressed in the suckers. One sequence 
was not detected in our transcriptome data sets. In a and b, red asterisks 
indicate subunits with the substitution known to confer anionic permissivity”*. 
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c, Divergent octopus subunits lack nearly all residues necessary for ACh 
binding. Alignment of sequence flanking the cysteine loop (yellow) of the 

L. stagnalis ACh-binding protein (Lst_AchBP), the human and octopus alpha- 
7 receptor subunits (Hsa_AchR7, Obi_10697+), and the 23 divergent AChR 
subunits. Essential ACh-binding residues on the primary (pink) and 
complementary (blue) side of the ligand-binding domain are indicated”, with 
conservative substitutions in a lighter shade. Outside of the binding residues, 
residues shared between the alpha-7 subunits are shaded in light grey, with 
bold letters for conservative substitutions. 
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Extended Data Figure 8 | Active transposable elements and gene expression (defined as having at least 75% of expression in a single tissue; see 
expression specificity. a, Transposable element expression across 12 tissues. Source Data file for this figure). P value indicates the F-statistic for the 

b, Correlation between the total transposable element (TE) load (in bp) inthe __ significance of linear regression (HO: r = 0), with tissues with a P value =0.05 
5 kb regions flanking the gene and the fraction of genes with tissue-specific indicated in pink. 
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Extended Data Figure 9 | Synteny dynamics in octopus and the effect of 
transposable element (TE) expansions. a, Circos plot showing shared synteny 
across 6 genomes. Individual scaffolds are plotted according to bp length; 
scaffolds with no synteny are merged together (lighter arcs). Despite the large 
size of the octopus genome, only a small proportion of the scaffolds show 
synteny. b, Synteny reduction in octopus quantified based on synteny inference 
using gene families with at least one representative in human, amphioxus, 


Capitella, Helobdella, Octopus, Lottia, Crassostrea, Drosophila, and 


Nematostella. Drosophila, Helobdella and Octopus show the highest synteny 
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loss rates. Branch lengths, estimated with MrBayes”, reflect extent of local 
genome rearrangement (Supplementary Note 6). c, Enrichment of overall and 
specific TE classes (base pairs masked) around genes from ancient bilaterian 
synteny blocks, including those absent in octopus (see key). Asterisks indicate 
Mann-Whitney U-test with P value <0.02. d, Transposable element insertion 
history (Jukes—Cantor distance adjusted, see text) into the vicinity of genes 
from ‘lost’ synteny blocks. Note that only one SINE peak is present; a more 
recent peak (visible in “All genomic SINEs’) cannot be recovered from 

those insertions. 
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Extended Data Figure 10 | Cephalopod phylogeny and novelties. a, Whole- 
genome-derived phylogeny of molluscs and select other phyla showing the 
relative position of octopus at the base of the coleoid cephalopods. For methods 
see Supplementary Note 7.1. Members of the cephalopod class are indicated in 
blue, scale indicates number of substitutions per site. b, Phylogenetic tree of 
reflectin genes. Reflectins are cephalopod-specific genes that allow for rapid 
and reversible changes in iridescence. Six reflectin genes were identified in the 
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octopus genome. c, d, Novel gene expression across multiple tissues. Bars depict 
all cephalopod novelties; dark grey indicates sequences with no similarity to 
non-cephalopod genes using HMM searches (see Source Data for this figure). 
c, Counts of tissue-specific novelties in a given tissue. d, Proportion of 
expression of novel genes versus total expression in individual tissues. CNS 


(central nervous system) combines Supra, Sub, OL and ANC expression data. 
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protocadherin expansion (168 genes); IV, human protocadherin expansion (58 
genes); V, dachsous; VI, fat-like; VII, fat; VIII, CELSR; IX, Type II classical 
cadherins. Asterisk denotes a novel cadherin with over 80 extracellular 
cadherin domains found in Obi and Cte. 


Extended Data Figure 11 | Phylogenetic tree of cadherin genes. This is a 
larger image of Fig. 2a. Phylogenetic tree of cadherin genes in Hsa (red), Dme 
(orange), Nematostella vectensis (mustard yellow), Amphimedon queenslandica 
(yellow), Cte (green), Lgi (teal), Obi (blue), and Saccoglossus kowalevskii 
(purple). I, Type I classical cadherins; II, calsyntenins; III, octopus 
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Identification of cis-suppression of human disease 
mutations by comparative genomics 


Daniel M. Jordan!*, Stephan G. F rangakis**, Christelle Golzio?, Christopher A. Cassa’, Joanne Kurtzberg?, 
Task Force for Neonatal Genomics?, Erica E. Davis”, Shamil R. Sunyaev's & Nicholas Katsanis?§ 


Patterns of amino acid conservation have served as a tool for 
understanding protein evolution’. The same principles have also 
found broad application in human genomics, driven by the need to 
interpret the pathogenic potential of variants in patients”. Here we 
performed a systematic comparative genomics analysis of human 
disease-causing missense variants. We found that an appreciable 
fraction of disease-causing alleles are fixed in the genomes of other 
species, suggesting a role for genomic context. We developed a 
model of genetic interactions that predicts most of these to be 
simple pairwise compensations. Functional testing of this model 
on two known human disease genes** revealed discrete cis amino 
acid residues that, although benign on their own, could rescue the 
human mutations in vivo. This approach was also applied to 
ab initio gene discovery to support the identification of a de novo 
disease driver in BTG2 that is subject to protective cis-modification 
in more than 50 species. Finally, on the basis of our data and models, 
we developed a computational tool to predict candidate residues 
subject to compensation. Taken together, our data highlight the 
importance of cis-genomic context as a contributor to protein evolu- 
tion; they provide an insight into the complexity of allele effect on 
phenotype; and they are likely to assist methods for predicting allele 
pathogenicity”. 

Understanding the nature and prevalence of genetic interactions has 
the potential to elucidate the evolutionary forces that act on protein 
residues, protein complexes and, more broadly, genomes. Some stud- 
ies have reported that interactions are ubiquitous and contribute con- 
siderably to the evolutionary landscape”*, while others found that 
interactions are rare’. Even among those who agree that genetic inter- 
actions are important, the architecture of these interactions remains 
unclear: some studies find distinct interactions between two or three 
sites'*’', while others propose a complex interaction network, effec- 
tively responding to aggregate properties of the entire protein or the 
entire genome’*”’. 

One practical utility of comparative genomics has been high- 
lighted by our appreciation of the large number of rare variants in 
humans and the difficulty in inferring their contribution to disease’. 
To prioritize variants of interest, frequency in control populations and 
evolutionary conservation have become two prominent filters. 
Conserved regions are considered more likely to be intolerant of vari- 
ation’; programs such as PolyPhen* and SIFT® have employed this 
principle to predict functional effects of variants’. Although useful, 
these strategies are constrained, in part because they do not take into 
consideration the genomic context of the mutated allele. An allele can 
appear damaging in one sequence yet be neutral in an orthologous 
sequence of another species. This phenomenon, referred to as com- 
pensated pathogenic deviation (CPD), contributes an unknown, but 
potentially large, number of false negatives to the evaluation of func- 
tional sites'*”°. 


To examine the prevalence of CPDs, and to identify such sites, we 
used comparative genomics. A typical, non-CPD allele should cause 
the same phenotype in any orthologous sequence, regardless of genetic 
background. By contrast, when a variant that causes human genetic 
disease is found in a wild-type orthologous sequence, it is likely that the 
genetic background of that species exerts a compensatory effect on 
such a variant: it suppresses the phenotype, and thus protects the 
variant from negative selection'’’*'*. Previous studies have used this 
insight to quantify the fraction of CPD interspecies substitutions at 
~10% (refs 14-16). Other studies have reported estimates of the 
inverse value, namely the fraction of pathogenic variants that are pre- 
sent as CPDs in other species, ranging from 2-18% (refs 17, 18). We set 
out to produce a new estimate of this value. We collected two data sets 


a HumVar ‘disease’ b 
(22,207 variants) ClinVar ‘pathogenic’ 
(10,596 variants) 


Largest 
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(12)% 
24,304,185 ne 
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‘Pathogenic’ Compensation 
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Figure 1 | Distribution of variants found in sequence alignments. a, Venn 
diagram showing sizes and overlap of the ClinVar and HumVar data sets, 
and how many are found in the multiple sequence alignment. b, Estimated 
number of human disease variants found in the alignment. The smallest 
estimate (3.0%, dark blue) comes from using the intersection of both variant data 
sets, requiring the variant to be absent from 6,503 human exomes, and filtering 
out alignments with low-quality scores. With any methodology, at least 88% 
of human disease variants (grey) are not found in the alignment. c, d, Potential 
mechanisms for the occurrence of CPDs in evolution. Branches where the 
variant is fixed are purple; branches where the variant is pathogenic are red. In 
c, the variant is present neutrally in an ancestor, but is lost in primates. Subse- 
quent substitutions cause the ancestral allele to become pathogenic. In d, the 
variant is pathogenic in the ancestor, but mutations in a non-human branch 
cause it to become tolerated, and it arises later by mutation and becomes fixed. 
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Figure 2 | Relationship between variants and evolutionary distance. 
a, Model for fixation of CPDs. Neutral changes (crosses) arise neutrally. Some 
of these (blue) compensate for alleles that would otherwise be pathogenic. The 
parameter k represents the number of compensatory changes required for each 
pathogenic allele. Once k compensatory changes have fixed, the CPD (red) fixes 
neutrally. b, The relationship between evolutionary distance and the number of 
variants in the alignment is expected to be different for individual benign 
variants (black) and pathogenic variants with different numbers of 
compensations (blue, one; red, two; cyan, five; magenta, ten). c, d, Observed 
distribution of missense variants annotated as neutral (c) or pathogenic (d) in 
HumVar and present in vertebrate orthologues (bars), with maximum 
likelihood fits (black lines) and 95% confidence bands (grey shading). Panel 
d corresponds to a fitted value for k of 1.44 + 0.07. 


of missense single-nucleotide variants (SNVs), annotated as either 
benign or pathogenic, derived from two databases, one based on the 
literature (“HumVar’)*°”° and one based on clinical genetic laborat- 
ories and investigator submissions (‘ClinVar’)”®. Although the two 
databases are not fully independent, the majority of pathogenic var- 
iants were listed in one or the other (Fig. la). Overall, these data sets 
comprised 69,905 human missense mutations across 13,040 genes. We 
compared this data set to orthologous proteins from 100 vertebrates. 
As expected, we found the mutant residue for a large number of likely 
neutral human variants to be fixed in orthologues. However, the num- 
ber of pathogenic missense variants found in orthologues (CPDs) was 
surprisingly high: 5.6% + 0.5% of ClinVar variants and 6.7% + 0.4% 
of HumVar variants were found in the alignment of mammals. 
For all vertebrates, these numbers increase to 10.2% + 0.7% and 
12.0% + 0.5%, respectively (Table 1). 

Mindful of the possibility of false pathogenic annotations, we 
applied several filtering steps, including cross-referencing HumVar 
and ClinVar variant annotations with population frequency data, fil- 
tering on the basis of alignment quality”’, using alternative alignment 
methodologies, and requiring variants to be present in multiple species 
(Table 1 and Supplementary Note). Some filters did remove bona fide 
recessive alleles (since we did not allow carriers), as well as disease 
variants with incomplete penetrance, even though this class of alleles 
is, by definition, sensitive to genomic context and thus likely to be 
affected by compensation”. Nevertheless, all filtering steps retained 


Table 1 | Range of estimates for prevalence of CPDs in human disease 
High-quality MultiZ 


Unfiltered MultiZ alignment alignments 


HumVar 12.0% + 0.5% 11.5% +0.5% 
ClinVar 10.2% + 0.7% 9.9% + 0.7% 
HumVar+ClinVar 9.3% +1.0% 8.5% + 1.0% 
HumVar+ClinVar+ESP 75% + 1.0% 7.0% + 1.0% 


a substantial number of variants (Supplementary Tables 1 and 2). 
Including only variants that pass all filtering steps and are detected 
in >1 vertebrate, we predict that the minimum estimate of CPDs in 
human patients is 3% (Fig. 1b). This is consistent with previous ana- 
lyses, which have found that stringent filtering does not change the 
observed properties of CPDs to any notable extent’®””. As a final test, 
we extracted post hoc pathogenic alleles from three different sources, 
each of which used independent means for assessing pathogenicity in 
acute paediatric disorders; overall, CPD rates ranged once again 
between 3% and 9%; additional analyses of other possible sources of 
bias were likewise consistent with our initial observations and previous 
studies (see Supplementary Note and Supplementary Tables 3, 4, 5). 

We next turned to the question of the structure of the genetic 
interactions underlying such sites. Broadly, there are two possibilities: 
suppression of the disease phenotype may be the result of a small 
number of discrete compensatory substitutions; or suppression may 
be caused by a global shift in the properties of the gene, or the whole 
genome, caused by numerous substitutions that, individually, have 
small effects. The difference between these two models should be visible 
in the distribution of CPDs among orthologous sequences. During 
evolution, variants arise stochastically through a Poisson process: the 
expected amount of evolutionary time required to produce a given 
substitution is distributed exponentially**. For a CPD, the distribution 
should be different; the presence of a CPD mandates the presence of all 
compensatory substitutions necessary for the CPD to be rendered 
neutral. As such, the expected evolutionary time required to produce 
a CPD is the sum of the times required to produce each compensatory 
substitution, followed by the time required to produce the CPD. 

Previous studies have proposed different processes by which CPDs 
arise. The most plausible option is a neutral mechanism, in which the 
compensatory substitutions are neutral and arise/fix neutrally before 
the pathogenic substitution appears (Fig. 1c, d and Fig. 2a). In this case, 
the time required for each substitution to arise is given by an expo- 
nential distribution, and the time for all compensatory sites to arise is 
approximated by the convolution of multiple exponential distributions 
(a gamma distribution, in the case where all exponential distributions 
are identical). The number of exponential distributions included in the 
convolution corresponds to one plus the number of compensatory 
substitutions required, and it can be inferred from the shape of the 
distribution (Fig. 2b). 

Although the evolutionary time separating two sequences is not 
observable directly, we can approximate it using sequence distance 
(one minus sequence identity)**. We plotted the number of missense 
variants observed as a function of sequence distance for neutral var- 
iants and for CPDs. Qualitatively, the shapes of both distributions 
match theoretical expectations. The two distributions are distinct from 
each other (P= 1.6 X 10-°: Kolmogorov-Smirnov two-sample test; 
Supplementary Tables 6, 7). Additionally, the observed distribution of 
CPDs is weighted towards shorter evolutionary distances, as expected 
if most CPDs require a small number of individual compensatory 
substitutions, as opposed to the normal distribution expected if 
CPDs require many individual compensatory substitutions (Fig. 2b, d). 
To obtain a more precise estimate of the number of compensatory 
substitutions, we used maximum likelihood to fit several versions of 
the convolution-of-exponentials model with different combinations 
of variant data sets and alignment strategies (Fig. 2c, d; see Methods 


Present in >1 species 
in MultiZ alignment 


Mammalian subset of 


MultiZ alignment EPO alignment 


6.7% +04% 6.1% +0.3% 7.5% + 04% 
5.6% + 0.5% 4.7% + 0.5% 6.5% + 0.6% 
5.3% + 0.8% 3.9% + 0.7% 5.5% + 0.9% 
3.8% + 0.7% 3.0% + 0.6% 4.0% + 0.8% 


Fraction of likely pathogenic mutations in humans considered to be CPDs according to different filtering paradigms. Values represent the fraction of variants for which an alignment could be retrieved where the 
variant amino acid is present in an orthologue sequence; error ranges are Jeffreys 95% confidence intervals. ESP, NHLBI Exome Sequencing Project; EPO, Enredo-Pecan-Ortheus pipeline. 
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Figure 3 | Compensatory mutations rescue pathogenic alleles in BBS4 and 
RPGRIPIL. a, The pathogenic BBS4 165H allele is fixed in six species. 
Secondary sites 160, 163 and 366 are possible CPDs. b, The pathogenic 
RPGRIPI1L 9371 allele is fixed in four species. The 189L, 193L and 961T alleles 
are present in all four species. c, Examples of zebrafish convergent extension 
phenotypic groups. d, Human RNA encoding the BBS4 165H mutation and 


and Supplementary Tables 6, 7). Most versions of the model fit best as 
the convolution of approximately two exponential distributions, sup- 
porting a mechanism in which most CPDs are compensated by simple 
pairwise interactions. Additionally, most models reported similar 
rates of evolution for neutral variants, CPDs and compensatory var- 
iants, suggesting that the target size for compensatory changes is 
small. We repeated these analyses with multiple different variant data 
sets and alignment strategies, finding similar results each time 
(Extended Data Fig. 1 and Supplementary Table 8). 

These analyses predict that most CPDs could be rescued by one 
large-effect compensatory substitution. We tested this prediction 
experimentally. We posited that each vertebrate sequence that includes 
a CPD should also include its cis-compensatory allele. Therefore, every 
amino acid difference between the human sequence and the sequence 
of the orthologue(s) containing a CPD is a candidate compensatory 
substitution. Given the practical constraints of examining all possible 
compensatory substitutions in macromolecular complexes, we focused 
on substitutions within the same gene as the CPD. 

Scanning our list of candidate CPDs, we noted two alleles in genes 
involved in ciliopathies: a protein-encoding p.N165H change in BBS4 
and a p.R937L variant in RPGRIPIL, which contribute pathogenic 
alleles to Bardet-Biedl syndrome and Meckel-Gruber syndrome, 
respectively**, These alleles were prioritized because: (1) Bardet- 
Bied] and Meckel-Gruber syndromes have a severe effect on repro- 
ductive fitness; (2) previous studies have established loss-of-function 
zebrafish phenotypes rescuable by human messenger RNA for both 
genes*”*; (3) in vivo complementation has indicated both alleles to be 
deleterious to human protein function*”*; and (4) we observed multiple 
species with the human mutant allele fixed: six species for BBS4 165H 
and four for RPGRIP1L 937L (Fig. 3a, b)—for this reason, both alleles 
were predicted to be benign (PolyPhen-2, SIFT, MutationAssessor). 


allele 


either 366R or 366T can rescue the morphant phenotype; RNA encoding 
165H mutation alone cannot. WT, wild type. e, Mutation of 189L, 193L or 
961T, in the background of 937L RPGRIPIL mRNA, rescues the loss of 
function observed in 937L RNA. Significance was determined by x test. 
See Supplementary Table 9 for embryo counts. 


Comparative genomic analysis identified 9 candidate sites in BBS4 
and 32 candidate sites in RPGRIP1L (Supplementary Table 9). To test 
each site, we took advantage of the established convergent extension 
defects induced by morpholino (MO)-mediated suppression of bbs4 
or rpgrip1l in zebrafish*’°. Consistent with previous observations, 
suppression of bbs4 or rpgrip1l induced convergent extension defects 
in 80% and 50% of embryos respectively (n = 50-100 embryos; 
Fig. 3c-e). Co-injection of MO with human wild-type mRNA rescued 
this phenotype, whereas injection with human mutant mRNA showed 
no improvement (Fig. 3d, e). We next tested the entire candidate 
complementing allelic series for each gene. For BBS4, the introduction 
of 2/9 candidate residues in cis with the 165H-encoding mRNA ame- 
liorated the phenotype in a manner indistinguishable from wild-type 
mRNA. Strikingly, both complementing alleles affected the same 
amino acid and were specific to the compensatory changes: the 
165H/366N and the 165H/366S behaved as null, whereas 165H/ 
366R was indistinguishable from wild type; 165H/366T converted 
the functional null to a hypomorph (Fig. 3d and Extended Data 
Fig. 2a). 

We observed a similar pattern for RPGRIPIL. Testing each of the 32 
candidate sites identified three complementing events, two of which 
map to the same region: 937L/189L, 937L/193L and 937L/961T (Fig. 3e 
and Extended Data Fig. 2b). Testing each complementing allele indi- 
vidually showed them to be either extremely mild or benign 
(Supplementary Table 9). Finally, comparative genomic analysis 
showed that these data could explain the tolerance of the RPGRIP1L 
937L change in all four species and of the BBS4 165H change in 4/6 
species (Fig. 3a, b). 

The above analysis is limited by its retrospective nature. We there- 
fore tested the usefulness of our model in ab initio gene discovery. We 
have recently initiated a whole-exome sequencing (WES) and func- 
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Figure 4| A de novo BTG2 p.V141M-encoding allele causes microcephaly. 
a, Pedigree DM048. Chromatograms show a de novo c.421G>A nucleotide 
change. WT, wild type. b, Suppression of btg2 leads to head size defects. Dorsal 
view of uninjected control and btg2 MO-injected zebrafish embryos at 4 dpf. 
White arrows show the distance measured from forebrain to hindbrain. Red 
line shows the protrusion of the pectoral fins in uninjected controls. 

c, Distribution of head size measurements at 4 dpf (Supplementary Table 10; 


tional testing paradigm to accelerate gene discovery in young children 
called Task Force for Neonatal Genomics (TFNG). Patients who 
display anatomical phenotypes amenable to functional modelling in 
zebrafish are evaluated by trio-based WES and have candidate alleles 
tested systematically in vivo’. 

We enrolled a 17-month-old female with an undiagnosed neuro- 
anatomical condition hallmarked by microcephaly (Fig. 4a). We fil- 
tered WES data for non-synonymous and splice variants with a minor 
allele frequency of <1%, and we conducted a proband-centric trio 
analysis that yielded four candidates: de novo missense changes in 
BTG2 and NOS2; and recessive missense variants in TIN and 
LAMAI. Testing of an unaffected sibling excluded LAMA1; TTN, a 
known dominant cardiomyopathy locus”, is an unlikely driver. 

To investigate the pathogenicity of the BTG2 (p.V141M) and NOS2 
(p.P795A) protein-encoding changes, we studied btg2 and nos2 in 
zebrafish. Reciprocal use of Basic Local Alignment Search Tool 
(BLAST) between Homo sapiens and Danio rerio identified a single 
zebrafish btg2 orthologue and two zebrafish nos2 orthologues. We 
injected splice-blocking MO (sb-MO) or translational-blocking MO 
(tb-MO) (Extended Data Fig. 3) into zebrafish embryos (3 ng; n = 80 
embryos per injection) and scored for head size defects at 4 days post- 
fertilization (dpf) by measuring the anterior—posterior distance 
between the forebrain and the hindbrain (Fig. 4b). For nos2a/b MO- 
injected embryos, we saw no differences at the highest dose injected 
(8 ng for nos2a/b sb-MOs; Supplementary Table 10). By contrast, we 
found a significant reduction of anterior structures in btg2 morphants 
(P<0.0001; Fig. 4b, c). Co-injection of wild-type human BTG2 
mRNA with tb-MO resulted in significant rescue (P< 0.0001; 
Fig. 4c). In contrast, injection of mRNA harbouring 141M was signifi- 
cantly worse at rescue than wild type (P < 0.0001; Fig. 4c). 

BTG2 isa regulator of cell cycle checkpoint in neuronal cells” and is 
strikingly intolerant to variation in humans (Exome Variant Server 
(EVS)). To test the pathogenicity of 141M by a different assay, we 
performed antibody staining at 2 dpf (a time before the manifestation 
of microcephaly). We marked post-mitotic neurons in the forebrain 
with antibodies against neuronal HuC/HuD antigens, and we scored 
(blind, triplicate) on the basis of an established paradigm”’. btg2 
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white arrows in b), a.u., arbitrary units. d, 2 dpf zebrafish embryos stained for 
PH3. Human RNA containing the V141M mutation is unable to rescue the 
reduced proliferation of btg2 morphants. Scale bars, 250 jum. e, Quantification 
of PH3-positive cells: human RNA with mutations V141M and either R80K 
or L128V can rescue knockdown of btg2. Error bars represent standard 
deviation. f, The 141M allele is fixed in 59/87 species besides primates, examples 
displayed here. See Supplementary Table 11 for PH3 quantification. 


morphants displayed a significant decrease in HuC/HuD staining 
(P< 0.0001; Extended Data Fig. 4). This defect was rescued with 
wild-type BTG2 mRNA (P< 0.05), but could not be ameliorated by 
141M-encoded mRNA co-injection (Extended Data Fig. 4). 
Importantly, co-injection of btg2 tb-MO with two rare control EVS 
alleles (p.A126S and p.R145Q) resulted in rescue, providing evidence 
for assay specificity (Extended Data Fig. 4b). As a third test, we stained 
whole embryos with a phospho-histone H3 (PH3) antibody that marks 
proliferating cells. We counted the number of positive cells in a defined 
anterior region of embryos. We saw a significant reduction in cell 
proliferation in the heads of 2 dpf btg2 morphants (P < 0.0001); this 
defect was likewise rescued by co-injection of wild-type mRNA, while 
141M mutant rescue was indistinguishable from btg2 tb-MO alone 
(P = 0.38; Fig. 4b, d). Combined, all three assays indicated that 
BTG2 p.V141M is pathogenic and that haploinsufficiency of this gene 
probably contributes to the microcephaly of the proband. 

Despite our functional and genetic data for p.V141M, this allele was 
predicted computationally to be benign. A likely reason is that, with 
the exception of primates, most BTG2 orthologues encode Met at the 
orthologous position (Fig. 4f). These data suggested that V141 might 
represent a CPD site in primates that branched from the ancestral 
methionine. To test this possibility, we identified nine BTG2 sites that 
co-evolved with 141M (Supplementary Table 11), which we mutagen- 
ized into the human construct encoding 141M. We then injected 
embryos with btg2 MO; MO plus wild-type human BTG2 mRNA; 
MO plus 141M-encoding mRNA; or MO plus 141M in cis with one 
of the nine candidate complementing alleles. Seven of the alleles had 
no effect (Supplementary Table 11). However, R80K- or L128V- 
encoded mRNA on the 141M backbone rescued the number of 
PH3-positive cells to wild-type levels (Fig. 4e and Extended Data 
Fig. 2c); both alleles were benign on their own (Supplementary 
Table 11). Taken together, our data indicated that 141M is deleterious 
in the human background, but the protection of this residue conferred 
by either Lys 80 or by Val 128 can explain >90% (54/59) of species 
encoding 141M (Fig. 4f). 

To improve the scalability of detecting CPDs, we used our model of 
CPD evolution to develop a computational predictor for distinguish- 
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ing variants that are unlikely to be CPDs from those that might be 
CPDs, and to identify candidate compensations to aid experimental 
design (http://genetics.bwh.harvard.edu/cpd/). Initial testing of this 
tool intimated high negative predictive values but modest positive 
predictive values, probably due to the dearth of known CPDs 
(Supplementary Note). 

Our results contrast with some previous studies that claim that 
epistasis is ubiquitous’; or that it is practically nonexistent’; or that 
it is commonly of higher order’*”’. The most likely explanation for this 
discrepancy is that such studies have examined different kinds of 
variation and traits. For example, studies on the evolution of genetic 
incompatibilities rely on assumptions of high mutation rate and weak 
negative selection, assumptions that generally do not hold for the case 
of pathogenic missense variation’®**. The difference with the studies 
suggesting higher-order cis-interactions may be to do with the scale of 
evolutionary time our analyses probe: the span of hundreds of millions 
of years of evolution represented by the vertebrate alignment may not 
be long enough to reveal higher-order combinations of non-synonym- 
ous SNVs. Indeed, using neutral SNVs from the HumVar data set as a 
control, we estimate the vertebrate alignment has explored 12% of 
pairwise interactions between SNVs, compared to 0.6% of three-way 
interactions between SNVs. It is possible that higher-order interac- 
tions are common, but are not detectable without a deeper alignment. 

Finally, considering the accelerated use of genome editing to model 
human pathogenic mutations in a variety of model organisms, our data 
highlight the critical need to not only pair computational predictions 
with functional studies, but also to evaluate the effect of human muta- 
tions in the context of the human sequence. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Data sets of known benign and pathogenic variants. Our primary training data 
set was HumVar, one of the training data sets for PolyPhen-2.2.3. We used the 
most recent public release at the time of this publication (December 2011), avail- 
able for download at http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads. 
This data set is derived from SwissVar variant annotations'’. It contains 22,207 
variants annotated as pathogenic and 21,433 variants annotated as benign. We also 
used a data set of pathogenic variants derived from the June 2014 release of the 
ClinVar database”. This data set consists of all missense variants from ClinVar 
that are unambiguously (that is, classified the same by all submitters) and con- 
fidently (that is, not a ‘Likely’ annotation) annotated as ‘Pathogenic’. It contains 
10,596 variants annotated as pathogenic and 1,926 variants annotated as benign. 
The intersection of these two data sets contains 3,563 variants annotated as patho- 
genic and 454 variants annotated as benign. As an additional control, we required 
that the pathogenic variants be absent from 6,503 human exomes* (EVS; http:// 
evs.gs.washington.edu/EVS/). This most stringent data set contains 3,062 variants 
annotated as pathogenic. 

Comparative genomics screen for CPDs. We used the University of California, 
Santa Cruz (UCSC) MultiZ whole-genome alignments of 100 vertebrate 
sequences*’, downloaded from UCSC as translated exons. As an alternative align- 
ment strategy, we used the EPO alignment of 37 eutherian mammal species”, 
downloaded as nucleotide sequences and translated for all aligned species using 
the human open reading frame. In cases in which the alignment contained mul- 
tiple sequences from the same species, only the sequence most similar to the 
human sequence was retained. Variants were classified as CPDs if the variant 
amino acid was found in the translated sequence of any vertebrate orthologue 
other than human or chimpanzee, with chimpanzee being excluded because pres- 
ence in the chimpanzee sequence may be used as evidence for neutrality in variant 
annotation databases. The resulting data set of neutral and deleterious variants 
found in vertebrate orthologues is available (see Source Data for Fig. 1). 
Statistical models of variant density. We modelled the density of benign variants 
with an exponential distribution, with scale parameter /neut representing the mean 
time to fixation of neutral alleles. We used three different models for the density of 
CPDs. (1) k compensatory changes fix at rate 1/0, followed by the now-neutral 
CPD at the same rate. This is represented by a gamma distribution with shape 
parameter k + 1 and scale parameter 0. (2) k compensatory changes fix at rate 1/0), 
followed by the now-neutral CPD at an independent rate 1/02. This is represented 
by the convolution of a gamma distribution with shape parameter k and scale 
parameter 0,, and an exponential distribution with scale parameter 02. (3) k 
compensatory changes fix at rate 1/B.omp, followed by the now-neutral CPD at 
the neutral rate 1/Breut- 

We assume a reversible model of evolution, so that the same three models can 
apply both to the fixation of CPDs not present in an ancestral sequence and to the 
loss of CPDs that are present in an ancestral sequence. The random variable used 
in these models is the sequence distance to the closest sequence containing the 
variant, where sequence distance is defined as the fraction of aligned (that is, non- 
gapped) positions that are identical. 

Fitting observed density to statistical models. We recorded for each variant 
found in a vertebrate orthologue the minimum number of amino acid differences 
between that variant and a vertebrate orthologue, not counting gapped sites, 
normalized by the length of the sequence. We then used maximum likelihood 
to fit the neutral model and each of the three pathogenic models described earlier, 
using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) optimization algorithm to 
maximize the likelihood functions. We repeated the fit using each of our filtered 
variant data sets and alignment methodologies, as well as discarding all variants 
that were only found in the alignment in a single sequence. All three models fit 
reasonably well, and all produced qualitatively similar results on all data sets and 
alignment methodologies. The exact fitted parameter values are found in 
Supplementary Table 8. 

Prediction method. Our prediction method is implemented in Perl, and the 
source code is available (Supplementary Data 1-CPD Predictor Code). To cal- 
culate the probability that a variant is a CPD, we find the minimum distance to the 
variant in the multiple sequence alignment and apply Bayes’ Law. We use the third 
likelihood model described earlier, where the CPD fixes at the neutral rate, using 


the maximum likelihood inferred parameter values. As our prior we use the 
well-established result that 10% of variants seen in another sequence are CPDs. 
Candidate compensation sites are identified by collecting all substitutions found in 
any sequence containing the candidate CPD, prioritizing sites that are substituted 
in many sequences over sites that are substituted in only a few sequences. 

WES. Research study participants were enrolled upon informed consent accord- 
ing to protocols approved by the Duke University Internal Review Board. We 
conducted paired-end pre-capture library preparation by fragmenting genomic 
DNA through sonication, ligating it to the Illumina multiplexing PE adapters, and 
PCR amplification using indexing primers. For target enrichment/exome capture 
we enriched the pre-capture library by hybridizing to biotin-labelled VCRome 2.1 
(ref. 34) in-solution Exome Probes at 47 °C for 64-72 h. For massively parallel 
sequencing, the post-capture library DNA was subjected to sequence analysis on 
an Illumina Hiseq platform for 100 bp paired-end reads (130 median coverage, 
>95% target coverage at 10X). Primary data were interpreted and analysed by 
Mercury 1.0; the output data from Illumina HiSeq were converted from bel files to 
FastQ files by Illumina CASAVA 1.8 software, and mapped by the BWA program. 
We performed variant calls using Atlas-SNP and Atlas-indel”. 

Morpholino design. MOs targeting zebrafish bbs4 and rpgrip1l were obtained 
from Gene Tools, and described previously*”°. MOs against zebrafish btg2 and 
nos2 were obtained from Gene Tools (Extended Data Fig. 3; sequences available 
upon request). 

Site-directed mutagenesis. Mutant alleles were generated as described”. 
Sequences were validated via Sanger sequencing on Applied Biosystems 3730xl 
DNA Analyzer. 

mRNA synthesis and zebrafish embryo injection. mRNA was transcribed in 
vitro as described** using SP6 Message Machine kit (Ambion). MO and mRNA 
concentrations were determined based on the combination by which wild-type 
mRNA efficiently rescued the morphant phenotype. The same concentrations 
were used for rescue with mutant mRNA or injection of mRNA alone. The MO 
and mRNA concentrations injected were as follows: 0.7 ng bbs4 MO and 100 pg 
BBS4 mRNA; 5 ng rpgrip1] MO and 100 pg RPGRIPIL mRNA; 3 ng btg2 MO and 
150 pg BTG2 mRNA; 8 ng n0s2q; 8 ng nos2b. All animal work was performed in 
accordance with the protocols and guidelines of the Duke Institutional Animal 
Care and Use Committee. 

Classification and scoring of embryos. Embryos injected with bbs4 or rpgrip1I 
MOs were classified into two graded phenotypes on the basis of the relative 
severity compared with age-matched uninjected controls from the same clutch, 
as described previously*°. Comparisons between injections of MO alone, mRNA 
alone, mutant rescue, and wild-type rescue were performed by ” test. 

Embryos injected with btg2 MO at were fixed in 4% paraformaldehyde at either 
2 dpf or 4 dpf. Two days post-fertilization embryos were stained for HuC/HuD or 
PH3. HuC/HuD was scored and quantified as described’*. Four days post-fert- 
ilization embryos were transferred to 1X PBS and bright-field dorsal images were 
captured; we assessed head size by measuring the distance from the anterior-most 
region of the forebrain to the hindbrain as defined by the attachment of the 
pectoral fins. 

PH3-positive cell quantification was done using the Image-based Tool for 
Counting Nuclei (ITCN) plugin for the ImageJ software. Rolling background 
subtraction (25-pixel radius) and outlier removal (2.5-pixel radius, threshold = 5) 
were used to process images. Linear measurement of a typical cell was used to 
determine cell radius for ITCN analysis. Threshold level was set to 0.5. Statistical 
comparisons between groups were performed with a Student’s t-test. 
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Extended Data Figure 1 | Different alignment methodologies with HumVar __ in the text (MultiZ unfiltered, MultiZ mammals-only, EPO, MultiZ with 
and ClinVar produce qualitatively similar alignments. a, b, Distributions of alignment quality filter, MultiZ with >1 sequence filter). All distributions 
missense variants annotated as neutral (a) or pathogenic (b) inthe HumVar _are quantitatively similar. Compare with Fig. 2c, d. 

and ClinVar data sets, with each of the five alignment strategies described 
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Extended Data Figure 2 | Protein domain structure of functionally tested 
human disease genes. a, Schematic of BBS4 (519 amino acids) is depicted with 
eight tetratricopeptide (TPR) domains (yellow); b, RPGRIP1L (1,315 amino 
acids) has multiple coiled-coil domains (green rectangles) and two protein 
kinase C conserved region 2 (C2) domains (green hexagons); and c, BTG2 
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(158 amino acids) has one BTG1 domain (purple pentagon). Disease-causing 
alleles are shown with red stars; complementing alleles are represented with 
blue stars; amino acid number scale in increments of 100 is shown below 
each schematic. 
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Extended Data Figure 3 | Evaluation of btg2 and nos2a/b MOs. indicates the translational start site; arrows, polymerase chain reaction with 
a-c, Schematic of the D. rerio btg2, nos2a and nos2b loci. Blue boxes, exons; reverse transcription (RT-PCR) primers; number indicates the targeted exon. 
dashed lines, introns; white boxes, untranslated regions; red boxes, MOs; ATG _ d, e, Agarose gel images of nos2a/b RT-PCR products. 
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Extended Data Figure 4 | HuC/HuD staining and quantification of 2 dpf 
zebrafish embryos confirms pathogenicity of BTG2 V141M. a, Suppression 
of btg2 leads to a decrease of HuC/HuD levels at 2 dpf. Representative ventral 
images of control, btg2 morphants (images show unilateral or absent HuC/ 
HuD expression), and a rescued embryo injected with a btg2 MO plus human 
BTG2 wild-type (WT) mRNA. Scale bar, 250 um. b, Percentage of embryos 
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with normal, bilateral HuC/HuD protein levels in the anterior forebrain or 
decreased/unilateral HuC/HuD protein levels in embryos injected with btg2 
MOs alone or MOs plus human BTG2 wild-type or variant mRNAs (p.V141M, 
index case; p.A126S and p.R145Q, control alleles). *P < 0.05 (two-tailed t-test 
comparisons between MO-injected and rescued embryos; n = 38-78 per 
injection batch). 
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Genetic compensation induced by deleterious 
mutations but not gene knockdowns 


Andrea Rossi!*, Zacharias Kontarakis!*, Claudia Gerri’, Hendrik Nolte}, Soraya Holper', 


Marcus Kriiger't & Didier Y. R. Stainier! 


Cells sense their environment and adapt to it by fine-tuning their 
transcriptome. Wired into this network of gene expression control 
are mechanisms to compensate for gene dosage. The increasing use 
of reverse genetics in zebrafish, and other model systems, has 
revealed profound differences between the phenotypes caused 
by genetic mutations and those caused by gene knockdowns at 
many loci’*, an observation previously reported in mouse and 
Arabidopsis*’. To identify the reasons underlying the phenotypic 
differences between mutants and knockdowns, we generated muta- 
tions in zebrafish egfl7, an endothelial extracellular matrix gene of 
therapeutic interest, as well as in vegfaa. Here we show that egfl7 
mutants do not show any obvious phenotypes while animals 
injected with egfl7 morpholino (morphants) exhibit severe vas- 
cular defects. We further observe that egfl7 mutants are less sens- 
itive than their wild-type siblings to Egfl7 knockdown, arguing 
against residual protein function in the mutants or significant 
off-target effects of the morpholinos when used at a moderate dose. 
Comparing egfl7 mutant and morphant proteomes and transcrip- 
tomes, we identify a set of proteins and genes that are upregulated 
in mutants but not in morphants. Among them are extracellular 
matrix genes that can rescue egfl7 morphants, indicating that they 
could be compensating for the loss of Egfl7 function in the pheno- 
typically wild-type egfl7 mutants. Moreover, egfl7 CRISPR inter- 
ference, which obstructs transcript elongation and causes severe 
vascular defects, does not cause the upregulation of these genes. 
Similarly, vegfaa mutants but not morphants show an upregula- 
tion of vegfab. Taken together, these data reveal the activation of a 
compensatory network to buffer against deleterious mutations, 
which was not observed after translational or transcriptional 
knockdown. 

Interfering with a gene’s function is a widely used strategy to 
decipher its role. Several different approaches have been developed 
over the years to achieve this goal. Yet, despite having the same goal 
of functional inactivation, different strategies, namely knockdown (via 
antisense) and knockout (via genetic inactivation), often lead to dif- 
ferent phenotypes. These discrepancies could be caused by off-target 
effects of the knockdown reagents, the generation and use of hypo- 
morphic alleles, or other and more fundamental reasons. The recent 
development of new genome engineering techniques, such as TAL 
effector nucleases (TALENs) and clustered regularly interspaced short 
palindromic repeats (CRISPRs), is allowing the facile generation of 
mutations and has revived concerns over the lack of specificity of 
knockdown reagents’*. In several cases, toxicity due to off-target 
effects, induction of p53 (also known as tp53) transcription, interferon 
response, engagement of toll-like receptors and/or saturation of the 
RNA interference machinery can lead to phenotypes unrelated to the 
silencing of the target gene*’. To investigate further whether toxicity 
effects are the main reason for the differences between genetic mutation 
and gene knockdown phenotypes, we analysed the EGF-like-domain, 
multiple 7 (egfl7) gene. The egfl7 gene is a good candidate to address this 


question because of the lack of obvious phenotypes in the mouse 
mutants'*” and the severe vascular tube formation defects observed 
in knockdown experiments in zebrafish, frogs and human cells’*"*. 

We first generated egfl7 mutants using TALENs"* targeting exon 3, 
which encodes part of the EMI domain (Fig. la and Extended Data 
Fig. 1). This domain precedes other domains critical for Egfl7 activity, 
including EGF domains and the leucine-valine-rich carboxy (C) ter- 
minus (Fig. 1a)'*. We identified several deletion alleles including a 43 
and a 44 (Fig. 1b). The egfl7 A3 (hereafter egfl7”* 980) allele encodes a 
protein that lacks a non-conserved proline at position 50 (p.P50del) 
while the egfl7 4 allele (hereafter egfl7***’) is predicted to encode a 
truncated polypeptide containing a stretch of 29 incorrect amino acids 
starting with a Gln to Leu substitution at position 49 (p.Gln49Leufs*30) 
(Fig. 1b). To investigate the severity of these mutant alleles, we first 
examined egfl7 transcript levels by quantitative PCR (qPCR). The pre- 
mature stop codon in egfl7”*' led to a decrease of approximately 50% 
in transcript levels compared with wild-type (WT) and egfl7”*° mutant 
embryos, indicating an increased messenger RNA (mRNA) degrada- 
tion rate (Extended Data Fig. 2a). To characterize the different egfl7 
mutant alleles further, we cloned the egfl7 WT, s980 and s981 comple- 
mentary DNAs (cDNAs) in a mammalian expression vector and trans- 
fected HUVEC cells. Unlike Egfl7 WT and Egfl7°*°, the Egfl7°*? 
protein was mostly absent in the medium or the cells, suggesting that 
this truncated polypeptide is rapidly degraded and/or poorly translated 
and secreted (Extended Data Fig. 2b). Altogether, these data indicate 
that egfl7”*’ is a severe mutant allele, possibly even a null. 

To analyse Egfl7 function during vascular development, the egfl7”*’ 
mutant fish were crossed into the Tg(kdrl:HRAS-mCherry)” and 
Tg(kdrl:GFP)* backgrounds. We also developed a robust method 
based on high-resolution melt analysis to identify the different geno- 
types (Extended Data Fig. 1b). Surprisingly, no differences in gross 
morphology were evident between egfl7 WT and mutant animals. 
However, a sporadic onset of brain haemorrhage was observed in 
fewer than 5% of the mutant animals at 72 hours post-fertilization 
(hpf) (Fig. 1c, d). Besides the haemorrhagic foci, no obvious abnor- 
malities were detected in vasculogenesis, angiogenesis or circulation in 
any region of the brain or the rest of the body (Fig. le, f and Extended 
Data Fig. 3). Moreover, egfl7*’ mutant animals survive to become 
fertile adults. In summary, while egfl7 morphants exhibit severe vas- 
cular defects’’, egfl7 mutants exhibit very mild, if any, phenotypes. 

This discrepancy between mutant and morphant phenotypes could 
be explained by several reasons including morpholino (MO) off-target 
effects. We thus sought to assess the specificity and toxicity of egfl7 
MO. First, to evaluate the effectiveness of the egfl7 MO, we engineered 
the egfl7 locus through the co-injection of TALENs and a single- 
stranded DNA (ssDNA) donor encoding a Myc-tag (Extended Data 
Fig. 4), and generated a stable transgenic line. We then injected Egfl7 
Myc-tag embryos with 1 ng of egfl7 MO and analysed protein levels 
by western blot at 24hpf. The relative expression of Egfl7 Myc-tag 
was reduced by approximately 80% in the morphants compared with 
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uninjected embryos, revealing the ability of the MO to inhibit egfl7 
mRNA translation. A widely recognized MO off-target effect is the 
transcriptional activation of p53 (ref. 9). We thus measured p53 
expression by qPCR and observed no significant difference between 
embryos injected with 1 ng of MO and uninjected embryos. However, 
p53 expression was clearly induced in embryos injected with 2 or 4ng 
of MO (Extended Data Fig. 5). We next reasoned that if the egfl7 MO 
did not induce off-target effects, it should not cause defects in egfl7 null 
mutants. Thus, we injected embryos obtained from egfl77°”* 
incrosses with 1 ng of egfl7 MO. We subsequently selected and geno- 
typed 32 embryos that showed a vascular phenotype, namely interseg- 
mental vessel defects, reduced circulatory loop and/or pericardial 
oedema (Fig. 2a). Notably, we found that these embryos did not follow 
the Mendelian pattern observed in controls: 17 embryos were WT, 12 
heterozygous and only 3 mutant (Fig. 2b, c), suggesting that the egfl7 
mutants were less sensitive than WT to MO injections. Confocal 
micrographs of WT, heterozygous and mutant embryos injected with 
1 ng of egfl7 MO (Fig. 2a) support this hypothesis. To investigate why 
some mutant embryos showed a phenotype when injected with egfl7 
MO, we repeated this experiment using a lower MO dose (0.5 ng). This 
experiment resulted in a clear reduction of the number of mutants in 
the selected population (1 mutant, 20 heterozygous and 21 WT fish out 
of 42 selected for vascular abnormalities, P< 0.0001). In the same 
experiment, out of ten WT-looking embryos, eight were mutant and 
two heterozygous for egfl7* °81 (P = 0.0003) (data not shown), support- 
ing the observation that the mutant fish are less sensitive to egfl7 MO 
and indicating that the egfl7 MO has minimal off-target effects at these 
concentrations. 

To investigate the differences between the mutant and morphant 
phenotypes further, we used an alternative knockdown approach and 
took advantage of the recently developed CRISPR interference 
(CRISPRi) technology”’ to inhibit egfl7 transcript elongation. We 
designed two guide RNAs (gRNAs) to target the non-template strand 
of the 5’ untranslated region (UTR) and exon 2 of egfl7 as well as one 
gRNA targeting the template strand of exon 2 (negative control) 
(Extended Data Fig. 6a). The relative egfl7 expression levels were then 
quantified at 20 hpf using qPCR on pools of ten embryos injected with 
gRNAs and catalytically inactive (dead) CAS9 (dCAS9). Non-template 
gRNAs were able to inhibit egfl7 transcript levels by approximately 
60% compared with uninjected or template gRNA-injected embryos 
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Figure 1 | Generation of zebrafish egfl7 mutant 
alleles and sporadic brain haemorrhage in 
mutant larvae. a, Top: egfl7 consists of 9 exons and 
mir126b is embedded in intron 6. The protein is 
encoded by exons 2-9 (grey boxes). a, Bottom: 
Egfl7, 277 amino acids (aa) long, contains a 
signal peptide (blue), an EMI domain (yellow), an 
EGF domain that contains a Delta-Serrate-LAG-2 
(DSL) motif (orange) and a Ca’* -binding EGF 
domain (red). b, Top: the egfl7? 89 lesion (A3) leads 
to the deletion of proline at position 50 (p.P50del). 
b, Bottom: the eg" allele (A4) encodes a 
truncated 77-amino-acid-long polypeptide 
(p.Gln49Leufs*30) that contains a signal peptide 
(blue) and a partial EMI domain (yellow) followed 
by a frameshift leading to a premature stop 
codon. ¢, d, Brightfield micrographs of 72 hpf WT 
and egfl7°8"”"8" larvae in lateral and ventral 
views. Arrows point to area of haemorrhage. 

e, f, Confocal micrographs of 72 hpf Tg(kdrl: HRAS- 
mCherry) WT and eg7*?°""*" larvae in lateral 
and dorsal views. 


276 


(Extended Data Fig. 6b). Tg(kdrl:GFP) embryos injected with gRNAs 
and dCAS9 exhibited different degrees of vascular abnormalities at 
48 hpf, including intersegmental vessel defects, reduced circulatory 
loop and pericardial oedema after non-template but not template 
gRNA injections (Extended Data Fig. 6c). Altogether, these data show 
that transcriptional or translational knockdown of egfl7 can lead to 
severe cardiovascular phenotypes while a severe genetic lesion does not. 

To identify molecules underlying the different phenotypes observed 
in mutants versus morphants, we performed mass spectrometry and 
RNA profiling analyses in egfl7 WT, homozygous mutant (egfl7***’) 
and morphant embryos at 24 hpf. We assessed the proteomes by 4h 
‘single shot’ liquid chromatography-tandem mass spectrometry (LC- 
MS/MS) and identified more than 6,000 proteins with high repro- 
ducibility (r > 0.90 for biological and technical replicates between 
mutants and WT; Extended Data Fig. 7). To identify significant differ- 
ences in individual protein expression, we used randomization-based 
false detection rate (FDR) estimation for multiple-testing correction 
and identified only one protein differentially expressed between 
mutants and WT (Fig. 3a). Strong upregulation (more than fivefold) 
was found for Emilin3a, suggesting its possible role in compensation. 
Additionally, we found no significant upregulation of Emilin3a in 
morphants compared with WT (Fig. 3b; Extended Data Fig. 8a), fur- 
ther highlighting Emilin3a as a possible compensating protein. 
Moreover, RNA sequencing (RNA-seq) and qPCR analyses indicated 
that not only emilin3a but also emilin3b and emilin2a were upregu- 
lated in mutants but not in morphants or CRISPRi injected embryos 
(Fig. 3c; Extended Data Fig. 8b). Interestingly, all these proteins con- 
tain an EMI domain, one of the key units of Egfl7 function", and, like 
Egfl7, can regulate elastogenesis*’’’. We then reasoned that if Emilins 
are able to functionally replace Egfl7, they might rescue egfl7 mor- 
phants. Embryos were injected with egfl7 MO or co-injected with egfl7 
MO and egfl7, egfl7*", Emilin2 or Emilin3 mRNA and screened for 
circulatory loop defects at 48 hpf. Similarly to egfl7 mRNA, Emilin2 
and Emilin3 mRNAs were both able to rescue the circulatory defects in 
a significant proportion of egfl7 morphants, while egfl7”*’ mRNA was 
not (Fig. 4). These results support the hypothesis that the upregulation 
of emilin genes in egfl7”*' mutants is at least partly responsible for 
their lack of phenotype. To test whether the transcriptional changes we 
identified in mutants but not morphants were a peculiarity of the egfl7 
locus, we generated TALEN mutants for vegfaa (data not shown). 


13 AUGUST 2015 | VOL 524 | NATURE | 231 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a Tg(kdrl:GFP) 


egfl7*/* 


egfl7*/- 


egfl77- 


0.0008 0.0041 0.0087 0.9105 


egfl7*/* 3/\ \ ba 
egfl7*/- / 


NI 
a 
I 


Percentage 
a 
= 
I 


25- 


Derivative fluorescence 
° 
8 
1 


~ Exp 1 Exp 2 


72 74 76 78 ~~ 80 Exp 3 Ctrl 


Temperature (°C) 
Figure 2 | Zebrafish egfl7 mutant embryos are less sensitive to egfl7 
morpholino injections. a, Confocal micrographs of Tg(kdrl:GFP) WT, 
egfl78"* and egf7*""*! 48 hpf embryos injected with 1 ng of eg/l7 MO 
(AS.47) in lateral views. b, High-resolution melt analysis genotyping example 
of 32 embryos (from an egfl7”8"”"* incross) selected for vascular defects at 
48 hpf, showing the melting curves of 17 eg/l7 WT (green), 12 egfl778"* (blue) 
and 3 egfl78!"8! (red) embryos. c, Genotype distribution (at 48 hpf) of 
egfl7°""* incross progeny injected with 1 ng of egfl7 MO at the one-cell stage 
and subsequently selected for the vascular phenotypes (independent 
experiments (Exp) 1, 2 and 3) or randomly selected (Ctrl). The population of 
randomly selected embryos follows the expected Mendelian ratio, but the 
phenotype-selected populations show significant skewing towards egfl7 WT. 
P value represents two-tailed value for a test with two degrees of freedom; 
n = 32 embryos genotyped in each experiment. Note that the egfl7°"’~ 
embryos are also underrepresented in the phenotype-selected populations 
(corresponding P values for experiments 1, 2 and 3 are 0.0033, 0.0066 and 
0.032, respectively). 


Interestingly, qPCR analysis showed that vegfab, a paralogue of vegfaa, 
was upregulated in mutants but not morphants (Extended Data 
Fig. 9a). Additionally, blocking Vegfaa function using a dominant 
negative approach also failed to trigger vegfab upregulation, placing 
the signal triggering compensation upstream of protein function 
(Extended Data Fig. 9b). 

Concerns have been raised over the use of antisense reagents, 
including MOs, as they may cause off-target effects and lead to aber- 
rant conclusions. This debate was recently revived by the generation of 
mutations in many genes whose function was previously studied using 
MOs; strikingly, a majority of the resulting mutants exhibit a different 
phenotype from the one reported for the corresponding morphants; in 
fact, most often the mutants exhibit no obvious phenotype’. In our 
study, we show that, at least for some genes, the phenotypic differences 
between mutants and morphants can be due to the activation of genetic 
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Figure 3 | Emilin3a is upregulated in mutant but not in morphant or 
CRISPRi embryos. a, Volcano plot showing significantly dysregulated 
proteins between 24 hpf egfl7 WT and egfl7””" mutant embryos using label-free 
quantification. Emilin3a and Emilin3b are highlighted in blue. b, Morphants 
did not show a significant upregulation of Emilin3a in unbiased mass- 
spectrometry-based proteomics comparing mutants, WT and morphants. A 
two-sided t-test was used to assess P values and FDR was controlled by a 
randomization-based SAM approach. c, mRNA expression of emilin3a, 
emilin3b and emilin2a in egfl7 WT, mutant, morphant and CRISPRi (template 
and non-template strand) embryos at 20 hpf; qPCR data, pools of 20-30 
embryos each, expression normalized to gapdh (WT expression set at 1 for each 
gene). The emilin genes were upregulated in egfl7”*' mutants but not after 
translational or transcriptional inhibition. *P = 0.05. 


compensation in the former but not the latter. We show here that the 
upregulation of Emilins can compensate for the loss of Egfl7, but 
anticipate that other genes are involved in this process. On the basis 
of our data, we propose two additional recommendations for using 
MOs: the first is to use doses that do not induce p53 expression as in 
many cases this induction indicates off-target effects; the second is to 
titrate the MO dose so that it does not cause additional phenotypes in a 
null mutant background, as such phenotypes would be due to non- 
specific effects. The mechanisms underlying the compensation observed 
in mutants but not in morphants are likely to be complex and so will 
their investigation. Interestingly, we observed no upregulation of the 
emilin genes in the A3 (s980) allele, suggesting that a non-deleterious 
genomic lesion is not sufficient to trigger this response. On the other 
hand, we observed emilin gene upregulation in embryos injected with 
egfl7 TALENs, indicating that a deleterious mutation does not need to go 
through the germline to trigger this response. We also detected emilin 
gene upregulation in embryos carrying only one egfl7*** mutant allele 
(data not shown). This observation might explain the partial protection 
of heterozygous embryos from egfl7 MO injections (Fig. 2). 

In summary, our data show that, for egfl7, one can identify a dose of 
MO that has no effect in most egfl7 mutant embryos but causes clear 
vascular defects in WT, indicating that these morphant phenotypes are 
not due to off-target effects. Further, egfl7 mutants show no pheno- 
types but exhibit a clear upregulation of several members of the emilin 
gene family. These Emilin proteins share an important functional 
domain with Egfl7, and, probably with additional proteins, can com- 
pensate for the loss of Egfl7 function. Notably, a recent study of the 
Icelandic population identified individuals with homozygous loss-of- 
function mutations in EGFL7 (ref. 22), indicating that compensation 
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Figure 4 | Emilin2 and Emilin3 can rescue egfl7 morphants. a, Design of the 
rescue experiment. One nanogram of egfl7 MO was injected in WT embryos, 
alone or together with 400 pg of mRNA (egfl7 WT, egfl7 44, Emilin2 or 
Emilin3). b, Injected embryos were sorted according to their circulatory loop 
phenotype into three classes: normal, slow and absent circulation. Injection of 
egfl7 MO resulted in 69% of embryos lacking circulation at 48 hpf. This 
percentage was reduced to 19% when co-injecting egfl7 mRNA, and to 37% and 
35% when co-injecting Emilin2 and Emilin3 mRNA, respectively. In contrast, 
mRNA from the egfl7 44 mutant allele did not rescue (80% of embryos lacked 
circulation). Uninjected siblings are shown for comparison (99% normal). 
Number at the bottom of each bar is the total number of embryos from two 
independent experiments. Error bars, s.e.m. for the ‘absent circulation’ class. 
P value represents two-tailed value for Fisher’s exact test. 


for severe lesions at this locus might also be at play in humans. It will be 
interesting to determine whether the upregulation of EMILIN genes is 
also present in these individuals. Of course, detailed studies will be 
needed to determine whether such compensation is the reason for the 
phenotypic differences between mutants and morphants for other 
genes. More importantly, our study illustrates the power of comparing 
mutants and morphants to identify modifier genes, a goal that remains 
a major challenge in the field of human genetics. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment, except for the data shown in Fig. 2 where 
the inherent design of the experiment includes a blinding component. 
Zebrafish handling. All zebrafish husbandry was performed under standard 
conditions in accordance with institutional (UCSF and MPG) and national ethical 
and animal welfare guidelines. 

Confocal microscopy. An LSM 700 confocal laser scanning microscope (Zeiss) 
was used for live imaging. Embryos and larvae were anaesthetized with a low dose 
of tricaine, placed in a glass-bottomed Petri dish (MatTek) with a layer of 1.2% low 
melt agarose and imaged using Plan-Apochromat 10X/0.45 and LCI Plan- 
Neofluar 25/0.8 objective lenses. Vessel integrity and permeability were analysed 
using micro-angiography. Fluorescein isothiocyanate (FITC)-dextran, 2,000 kDa 
(Sigma) was injected into the posterior cardinal vein at 48 or 72 hpf and imaged 
after 10 min. 

Plasmids. Total RNA extraction was performed using TRIZOL (Life Techno- 
logies) and used for cDNA synthesis using SuperScript second strand (Life 
Technologies). cDNAs encoding the Egfl7 WT, Egfl7*°° and Egfl7°*" proteins 
were PCR-amplified using whole embryo cDNA as template. PCR fragments were 
ligated into the mammalian expression vector pcDNA3.1 myc-HIS tag between 
EcoRI and Xhol. All constructs were verified by sequencing. pCMV-6 plasmids 
containing mouse elastin microfibril interfacer 2 (Emilin2) or mouse elastin 
microfibril interfacer 3 (Emilin3) cDNAs were purchased from Origene. 
pcDNA3.1 and pCMV-6 plasmids were respectively linearized using Smal and 
Agel and in vitro transcribed using the mMESSAGE mMACHINE 17 kit (Life 
Technologies). 

Cell culture and transfection. Authenticated HEK293FT (human embryonic 
kidney) (Life Technologies, R700-07) and HUVEC (human umbilical vein 
endothelial cells) cells were cultured at 37 °C in 5% CO,, 95% air in appropriate 
medium containing 10% fetal bovine serum, 100 units per millilitre penicillin and 
100g ml~! streptomycin. HEK293FT cells were used for biochemical studies 
because they are easy to grow and transfect, and have been used widely for cell 
biology research. All cell lines are routinely tested for mycoplasma in our facilities, 
and only mycoplasma-free cells were used in this study. Cells were transfected with 
cDNAs in antibiotic-free medium 12-24h before protein extraction, using 
FuGene HD (Roche) at a 3:1 ratio (tl:11g nucleic acid) and 0.18 ug DNA per square 
centimetre. Cells were lysed in RIPA buffer and extracellular proteins precipitated 
with TCA (final 20%). Samples were resuspended in Laemmli buffer before gel 
electrophoresis. 

Genome editing. TALENs were designed targeting egfl7 (Extended Data Fig. 1) 
using TALEN targeter’* (https://tale-nt.cac.cornell.edu/) and constructed using 
Golden Gate assembly'*. Zebrafish embryos were injected into the cell at the 
one-cell stage with 100 pg total TALEN RNA. 

Genome engineering was performed as previously described**. The C terminus 
egfl7 TALEN recognition sites are T@CTGGTAGACATCATC and TTGCAGTA 
GTGACTAGT. Between the binding sites is a 17-bp spacer with the ‘tag’ stop 
codon underlined (aggaaaactagacgatc). A ssDNA oligonucleotide (Supplementary 
Table 1) was designed to target the spacer sequence between the cutting sites. A 
Myc-tag sequence, flanked by Xhol restriction sites, was introduced in the centre 
of the oligonucleotide resulting in 25-base homology arms on the 5’ and 3’ ends. 
Polyacrylamide gel electrophoresis (PAGE)-purified oligonucleotides were 
obtained from Sigma. One-cell-stage embryos were injected with 100-200 pg 
TALEN mRNA and 75 pg ssDNA donor. Screening for founders was conducted 
using PCR followed by Xhol restriction enzyme digest and subsequently by 
sequencing. 

CRISPR interference. gRNA and CAS9 plasmids” were purchased from 
Addgene. Dead Cas9 was generated using the zebrafish-codon-optimized WT 
CAS (pT3TS-nls-zCas9-nls; nls, nuclear localization signal) as a template. The 
D10A and H840A mutations were generated using the primers in Supplementary 
Table 1. Site-directed mutagenesis was performed using PfuUltra Fusion HS 
(Agilent). The PCR reaction protocol was 95°C for 1 min, then 18 cycles of 
95 °C for 50s, 60°C for 50s and 68 °C for 1 min per kilobase of plasmid length, 
then 68 °C for 7 min and 4 °C hold. Dpn1 (1 tl) was added to the PCR reaction and 
incubated at 37 °C for 1h to digest parental DNA and transform into competent 
cells. CRISPR gRNAs were designed using CRISPR design (http://crispr.mit.edu/) 
(Zhang laboratory). Oligonucleotides were annealed in a thermo block at 90-95 °C 
for 5 min followed by a slow cooling to room temperature (~20 °C) and cloned in 
gRNA plasmid between BsmbI sites. All constructs were verified by sequencing. 
To make nls-zCas9-nls RNA, the template DNA (pT3TS-nls—zCas9-nls) 
was linearized by XbaI digestion and purified using a QlAprep column 
(Qiagen). Capped nls-zCas9-nls RNA was synthesized using a mMESSAGE 


mMACHINE 73 kit (Life Technologies) and purified using an RNA Clean and 
Concentrator kit (Zymo Research). To make gRNA, the template DNA was lin- 
earized by BamHI digestion and purified using a QIAprep column. gRNA was 
generated by in vitro transcription using a T7 RNA polymerase MEGA short script 
T7 kit (Life Technologies). After in vitro transcription, the gRNA (approx 100 
nucleotides long) was purified using RNA clean and concentrator (Zymo 
Research). dCAS9 mRNA (100-400 pg) and gRNA (50-100 pg) were co-injected 
in the cell at the one-cell stage and at least ten pooled embryos were used to 
evaluate the expression level of the targeted genes by qPCR. Initial experiments 
were performed with gRNAs targeting the tnnt2a gene. In general, a substantial 
increase in knockdown efficiency was observed when combining multiple guides, 
indicating a synergistic effect. The egfl7 gene was knocked down by using two to 
four gRNAs (Supplementary Table 1). 

Microinjection of morpholinos. The ATG morpholinos egfl7 (5'-CAGGTGT 
GTCTGACAGCAGAAAGAG-3’), vegfaa (5'-GTATCAAATAAACAACCAA 
GTTCAT-3’) and tp53 (5’-GCGCCATTGCTTTGCAAGAATTG-3’), were pur- 
chased from GeneTools and injected at the indicated amounts (0.5, 1, 2 or 4 ng for 
egfl7, 2ng for vegfaa and 1 ng for tp53). To identify the potential effects of p53 
induction in egfl7 morphants, we compared the phenotype of embryos co-injected 
with egfl7 and p53 MO or egfl7 MO alone (1 ng for each MO), and did not detect 
any obvious differences. The experiments testing the egfl7 MO effect on egfl7 
mutants were blind (injection into fertilized eggs from an incross of heterozygous 
fish followed by phenotyping and then genotyping). The egfl7 morphant rescue 
experiments were not blind. Sample sizes for these and other experiments were 
determined on the basis of previous experience. 

Genotyping. Embryos or fin-clips were placed in PCR tubes, with 50 kl of elution 
buffer (10 mM Tris-Cl, pH 8.5) and 1 mg ml"! proteinase K added to each well 
and then incubated at 55°C for 2h. The samples were then heated to 95°C 
for 10min to inactivate proteinase K. Primers were designed using primer3: 
http://biotools.umassmed.edu/bioapps/primer3_www.cgi. An Eco Real-Time 
PCR System (Illumina) was used for the PCR reactions and high-resolution melt 
analysis. DyNAmo SYBR green (Thermo Fisher Scientific) was used in these 
experiments. PCR reaction protocols were 95 °C for 15s, then 40 cycles of 95 °C 
for 2s, 60°C for 2s and 72 °C for 2s. Following the PCR, a high-resolution melt 
curve was generated by collecting SYBR-green fluorescence data in the 65-95 °C 
range. The analyses were performed on normalized derivative plots. 
Electrophoresis. Laemmli SDS-PAGE gels consisted of a 4-20% running gel and 
3% stacking gel or Tricine SDS-PAGE 12%. For immunoblots, membranes were 
blocked with 5% non-fat milk and incubated at 4 °C overnight with mouse (SC-40) 
or rabbit (SC-789) anti-Myc antibody (Santa Cruz Biotechnology), or anti-o- 
tubulin (T9026, Sigma). Membranes were then rinsed, incubated for 1h with 
horseradish-peroxidase-conjugated anti-rabbit immunoglobulin-G (IgG) or 
anti-mouse IgG (Santa Cruz Biotechnology), rinsed extensively, and labelled pro- 
teins were detected using the Clarity Western substrate (Biorad). 

Mass spectrometry. Embryos (egfl7 WT, egfl7*’ mutants and morphants) at 20- 
24 hpf were lysed in 6 M urea and 2 M thiourea (in HEPES buffer pH = 8.5). The 
lysate was clarified by centrifugation and proteins were subjected to in-solution 
digestion. In brief, proteins were reduced (0.1 M DTT, 30 min at room temper- 
ature) and alkylated (55 mM IAA, 30 min at room temperature in the dark). Lys-C 
was added in a 1:100 enzyme:protein ratio and incubated for 3h. Urea concen- 
tration was diluted to 2M using 50mM ammonium bicarbonate, and trypsin 
(Promega) was added in a 1:100 enzyme:protein ratio. After incubation for 18 h, 
generated peptides were de-salted using the Stop and Go Extraction tip technology 
before mass spectrometric analysis. All WT and egfl7**’ mutant experiments were 
performed at least in technical and biological duplicates. For morphant embryos, 
we measured protein changes in technical triplicates after pooling more than 20 
embryos, thus reducing biological variability. The instrumentation for LC-MS/ 
MS analysis consisted of a nano LC 1000 (Proxeon, now Thermo Scientific) 
coupled via a nano-electrospray ionization source to a quadrupole-based bench- 
top QExactive Plus or QExactive mass spectrometer. Separation of peptides 
according to their hydrophobicities was achieved on a 50cm in-house packed 
column (internal diameter 75 jum, C18 Beads (Dr. Maisch) diameter 1.8 jum) using 
a binary buffer system: (A) 0.1% formic acid in H,O and (B) 0.1% formic acid in 
80% acetonitrile. A linear gradient within 220 min from 8% to 38% of B, followed 
by an exponential increase to 90% B and a re-equilibration step to 5% B within 
20 min, was used for peptide elution. Mass spectra were acquired at a resolution of 
70,000 (200 m/z) using an AGC target of 1E6 and a maximal injection time of 
20 ms. A top ten method was applied for subsequent acquisition of high-energy 
collision-induced dissociation (HCD) fragmentation MS/MS spectra of the ten 
most intense peaks. Resolution was set to 17,500 at 200 m/z and 5E5 ions (AGC 
target) were collected in the C-trap within a maximal injection time of 60 ms using 
an isolation window of 1.3 thomsons (Th) (1 Th = 1.036426 X10 °kg cls, 
Raw files were processed using MaxQuant 1.4.1.2 (ref. 27) and the implemented 
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Andromeda search engine”*. For peptide assignment, MS/MS HCD fragmentation 
spectra were correlated to the Uniprot Danio rerio database (2014). A list of 
common contaminants was included in the searches that were performed with 
tryptic specificity. Default settings were used for MS and MS/MS mass tolerances 
and peptide length. The FDR was set to 1% on protein and peptide levels and 
estimated by the implemented decoy algorithm. Oxidation of methionine residues 
and acetylation on the protein N-term were set as variable modifications, and 
carbamidomethyl at cysteine residues was defined as a fixed modification. The 
match-between-runs, label-free quantification and re-quantification were enabled. 
Statistical analysis and data visualization were done in the environment R. 

The package siggenes from Bioconductor was used to determine significance of 
proteome changes at a FDR cutoff of less than 0.05 (ref. 29). Note that the protein 
list was filtered for at least 50% quantification over all experiments. In label-free 
protein quantification, a common problem is that low abundance proteins are 
likely to be not quantifiable, leading to a right-shifted normal distribution. Thus, 
missing values were replaced along a Gaussian distribution using a log, downshift 
of 1 and a width of 0.4. Imputation was inspected by histograms to mimic a 
Gaussian distribution for the complete data set (columnwise) to avoid too high a 
frequency of low-intensity values. Significance was assessed by a two-sided t-test of 
log, intensity values. Note that to compare morphants with WT, we used technical 
replicates of pooled embryos against all experiments for WT embryos. Five hun- 
dred randomizations were used to estimate FDR, using a cutoff of 0.05 while sO (the 
fudge factor) was defined as 0.1. Protein ratios were calculated by subtracting the 
average of the respective groups. Data are shown in Supplementary Table 2. 
RNA profiling. Total RNA from egfl7 WT, egfl78’ mutants and morphants at 
24hpf was prepared using TRIzol (Life Technologies). RNA profiling was per- 
formed by ZF-screens using an Illumina HiSeq 2500 ultra-high-throughput 
sequencing system. Data are shown in Supplementary Table 3. 
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qPCR. An Eco Real-Time PCR System (Illumina) was used for qPCR experiments. 
Gene expression was normalized relative to gapdh. All reactions were performed in 
technical triplicates; the results represent biological triplicates (unless otherwise 
stated) including the s.e.m. Supplementary Table 1 shows the primers used for 
these experiments. 

Additional data. Mature miR126 levels were quantified in WT and egfl775""8! 
embryos using the miRNA QRT-PCR Detection Kit (Agilent). No significant 
changes were observed at 48 or 72 hpf. Maternal zygotic egfl7*’ mutant embryos 
were generated by incrossing homozygous mutant adults; they exhibited no addi- 
tional phenotypes compared with zygotic mutant embryos. In addition, we 
observed no evidence of maternal egfl7 mRNA by RT-PCR. Mutant samples for 
proteomics, RNA-seq and qPCR analyses were MZ mutants. 
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Extended Data Figure 1 | Generation and identification of zebrafish egfl7 
mutant alleles. a, TALENs were designed to target exon 3 of egfl7 which 
encodes part of the EMI domain. Sequence alignment of part of exon 3 from 
the WT, egfl7** and egfl7**" alleles shows TALEN indels: 43/s980 (three 
nucleotide deletion) and A4/s981 (five nucleotide deletion), and one nucleotide 
insertion (yellow). b, Genotyping example of single embryos sampled from a 
population of egfl7 WT, egfl7 °°" and egfl7*"”"*' fish using high-resolution 
melt analysis. The green curve corresponds to the WT allele and the red one 
to the egfl7*" allele. Heterozygous embryos have both alleles and thus the 
melting profile (in blue) is a composition of the WT and mutant curves. 
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Extended Data Figure 2| The egfl7***' mutation leads to egfl7 mRNA 
degradation, reduced protein expression and impaired secretion. a, The 
egfl7?*" mutation leads to egfl7 mRNA degradation: egfl7 mRNA expression in 
24 hpf WT, egfl7°8""8! and egfl7788° embryos. Expression normalized to 
gapdh. b, The egfl7**" (p.Gln49Leufs*30) mutation leads to strongly reduced 
protein expression. Western blot analyses of Egfl7-Myc-tag expression in 
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transfected HUVEC cells. Egfl7 WT and Egfl7**? protein expression was 
strongly detected in the medium whereas the Egfl7*" isoform was strongly 
reduced in the cells and very poorly secreted (right), or undetectable in both 
(left). Furthermore, Egfl7*' shares high similarity to the truncated protein 
produced in the original Egfl7 mutant mouse in which the protein was not 
detectable using an Egfl7 antibody”®. 
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Extended Data Figure 3 | Vessel integrity and permeability do not 


of 72 hpf Tg(kdrl: HRAS:mCherry) expression, FITC-dextran and MERGE of 
appear to be affected in egfl7?°"”""*' larvae. A fluorescent molecule 


WT and egfl7 81/5981 1ayvae in (a) lateral and (b) dorsal views. The FITC- 
(2000 kDa FITC-dextran) was injected directly into the circulation of 72hpf dextran did not accumulate to the sites of haemorrhage, suggesting that these 


Tg(kdrl:HRAS:mCherry) larvae that previously showed haemorrhage, which sites had clotted and vascular integrity had been restored after the initial 
was mostly localized around the hindbrain ventricle. Confocal micrographs blood leakage. 
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Extended Data Figure 4 | In vivo genome editing: Myc-tag introduction 
in the egfl7 endogenous locus. a, TALENs targeting the egfl7 stop codon 
created double-stranded breaks in the chromosomal DNA. Homology-directed 
repair precisely incorporated the Myc tag exogenous sequence (ssDNA) at the 
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cut site. b, Western blot analysis of Egfl7-Myc-tag expression in 24 hpf control 
and morphant embryos. Egfl7 Myc-tag signal was reduced by around 80% 

in morphants (1 ng egfl7 MO) compared with uninjected. Expression 
normalized to tubulin (P = 0.05). Error bars, s.e.m. (n = 3). 
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Extended Data Figure 5| The egfl7 morpholino does not significantly 
affect p53 mRNA expression at 1 ng per embryo but it does so at higher 
doses. mRNA expression of p53 in 24 hpf WT, egfl7 A3 (egfl7*) and egfl7 
A4 (egfl7**') mutant, and morphant (1, 2 and 4 ng injected) embryos. 
Expression normalized to gapdh. Error bars, s.e.m. of technical triplicates. 
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Extended Data Figure 6| The egfl7 transcript elongation inhibition causes 
a phenotype similar to the one seen in morphants. a, gRNAs of egfl7 targeting 
the template (T) strand in exon 2 and non-template (NT) strand in the 5’ 
UTR and exon 2. b, Expression of egfl7 in non-template (NT) gRNA and 
template (T) gRNA-injected embryos relative to uninjected (CT) siblings at 
20 hpf. qPCR data, pools of ten embryos each, expression normalized to gapdh 
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(P = 0.05). Error bars, s.e.m. (1 = 3). ¢, Lateral view confocal micrographs 

of 48 hpf Tg(kdrl:GFP) embryos injected with egfl7 template and non-template 
CRISPRi. Template CRISPRi (top) embryos are indistinguishable from non- 
injected siblings, while non-template CRISPRi embryos exhibit different 
degrees of vascular defects (middle: mild; bottom: severe). 
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Extended Data Figure 7 | Single-shot proteomics to assess changes between Acquired spectra were analysed against the Uniprot zebrafish database (2014) 
WT and egfl7”*' mutant embryos. a, Schematic visualization of proteomic using MaxQuant. b, Scatter plot matrix shows high correlation between 
workflow. Embryos were lysed in urea buffer, and proteins were digested biological replicates. Reproducibility was determined by a Pearson correlation 
in-solution using trypsin and measured on a QExactive bench top instrument. coefficient. 
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Extended Data Figure 8 | Emilin3a expression is upregulated in mutant but —_ between morphant and WT embryos. Emilin3b is also highlighted in blue. 
not morphant embryos. a, Volcano plot showing significantly dysregulated __b, Bar plot showing upregulation of emilin family members in 24 hpf egfl7 
proteins between egfl7 morphant and WT embryos at 24 hpf using label- mutants compared with WT and morphants, as assessed from RNA-seq data 
free quantification. Emilin3a (blue) levels were not significantly different (WT expression set at 1 for each gene). 
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Extended Data Figure 9 | Expression of vegfab is upregulated in vegfaa 
mutant embryos but not in morphants, or vegfaa dominant negative- 
injected embryos; qPCR data, pools of ten embryos each, expression 
normalized to gapdh (P= 0.05). Error bars, s.e.m. (n = 5). a, mRNA 
expression of vegfab in 24 hpf vegfaa WT, mutant and morphant embryos. 
b, mRNA expression of vegfab in 24 hpf vegfaa WT and vegfaa dominant 
negative-injected embryos (two different dominant negatives were injected). 
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Metabolic rescue in pluripotent cells from 
patients with mtDNA disease 
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Mitochondria have a major role in energy production via oxidative 
phosphorylation’, which is dependent on the expression of critical 
genes encoded by mitochondrial (mt)DNA. Mutations in mtDNA 
can cause fatal or severely debilitating disorders with limited treat- 
ment options’. Clinical manifestations vary based on mutation 
type and heteroplasmy (that is, the relative levels of mutant and 
wild-type mtDNA within each cell)**. Here we generated genetic- 
ally corrected pluripotent stem cells (PSCs) from patients with 
mtDNA disease. Multiple induced pluripotent stem (iPS) cell lines 
were derived from patients with common heteroplasmic mutations 
including 3243A>G, causing mitochondrial encephalomyopathy 
and stroke-like episodes (MELAS)’, and 8993T>G and 13513G>A, 
implicated in Leigh syndrome. Isogenic MELAS and Leigh syn- 
drome iPS cell lines were generated containing exclusively wild- 
type or mutant mtDNA through spontaneous segregation of 
heteroplasmic mtDNA in proliferating fibroblasts. Furthermore, 
somatic cell nuclear transfer (SCNT) enabled replacement of 
mutant mtDNA from homoplasmic 8993T>G fibroblasts to gen- 
erate corrected Leigh-NT1 PSCs. Although Leigh-NT1 PSCs con- 
tained donor oocyte wild-type mtDNA (human haplotype D4a) 
that differed from Leigh syndrome patient haplotype (Fla) at a 
total of 47 nucleotide sites, Leigh-NT1 cells displayed transcrip- 
tomic profiles similar to those in embryo-derived PSCs carrying 
wild-type mtDNA, indicative of normal nuclear-to-mitochondrial 
interactions. Moreover, genetically rescued patient PSCs displayed 
normal metabolic function compared to impaired oxygen con- 
sumption and ATP production observed in mutant cells. We con- 
clude that both reprogramming approaches offer complementary 
strategies for derivation of PSCs containing exclusively wild-type 
mtDNA, through spontaneous segregation of heteroplasmic 
mtDNA in individual iPS cell lines or mitochondrial replacement 
by SCNT in homoplasmic mtDNA-based disease. 

Maternally inherited mtDNA encodes 13 proteins critical for 
oxidative phosphorylation, while the remaining protein subunits are 
encoded by nuclear DNA. Therefore, mitochondrial biogenesis 
requires coordinated interaction of protein subunits encoded by both 
genomes’. Mutations in mtDNA occur at a higher rate than in nuclear 
DNA, resulting in life-threatening conditions*®. 

We have described a strategy to prevent transmission of mtDNA 
mutations to children involving mitochondrial replacement’. To 
explore the feasibility of generating genetically corrected autologous 
PSCs, herein, we focus on three of the most common pathogenic 
mtDNA mutations. Skin samples were donated by a MELAS patient 
carrying a 3243A>G heteroplasmic mutation in tRNA" (MT-TL1)° 


and by Leigh syndrome patients carrying heteroplasmic or homoplas- 
mic 8993T>G mutations affecting the ATPase 6 gene (MT-ATP6)’, 
and heteroplasmic 13513G>A mutation in the MT-ND5S gene’. A panel 
of ten iPS cell lines from each mutation type was generated and quant- 
itative mtDNA mutation analysis was carried out using amplification 
refractory mutation system-quantitative polymerase chain reaction 
(ARMs-qPCR), with a detection threshold of 0.5%. In MELAS iPS cell 
lines, the mutation was undetectable in five lines and varied from 33% 
to 100% in the remaining five lines, compared to 29% heteroplasmy in 
parental fibroblasts (Table 1 and Extended Data Fig. 1a). In iPS cell lines 
from the heteroplasmic 8993T>G mutation, the mutation was unde- 
tectable in one line and ranged from 29% to 87% in the remaining lines, 
compared to 52% heteroplasmy in parental fibroblasts (Table 1 and 
Extended Data Fig. 1b). Mutation segregation in individual iPS cell lines 
from 13513G>A fibroblasts also ranged from 0% to 100%, compared 
to 84% heteroplasmy in fibroblasts (Table 1 and Extended Data Fig. 1c). 
Previous studies suggested that segregation of heteroplasmic mtDNA is 
specific to iPS cells and may occur during or after reprogramming’. 
To explore mechanisms, parental fibroblasts carrying 3243A>G and 
13513G>A mutations were subcloned and mutation loads in indi- 
vidual clones were analysed. Among ten randomly selected MELAS 
samples, five were homoplasmic containing either wild type (A) or 
mutant (G) at the 3243 position. The remaining five contained varying 
heteroplasmy levels similar to iPS cells (Table 1 and Extended Data 
Fig. 1d). Variable heteroplasmy levels were also observed in 13513G>A 
fibroblasts including homoplasmic mutant and wild-type clones 
(Table 1). Thus, segregation of heteroplasmic mtDNA mutations 
occurs in skin fibroblasts and may reflect a common phenomenon”. 

Isogeneic MELAS iPS cell lines carrying wild-type or mutant 
mtDNA maintained typical PSC morphology and formed teratomas 
containing cells and tissues from all three germ layers (Extended Data 
Fig. 2a, b). We next carried out whole mtDNA sequencing using the 
Illumina MiSeq platform and confirmed the 3243A>G mutation in 
parental MELAS-fib (46.8%), MELAS-iPS1 and MELAS-iPS3 (100%) 
cells while MELAS-iPS2 was homoplasmic for the wild-type allele 
(Supplementary Table 1). MELAS-fib also carried four additional het- 
eroplasmic mutations with one variant carried to MELAS-iPS1 and 
MELAS-iPS2 (Supplementary Table 1). 

The 3243A>G mutation perturbs tRNA‘ function and impairs 
mitochondrial protein synthesis as well as respiratory complex activity, 
with the homoplasmic mutation leading to prenatal lethality in 
humans’’. Oxygen consumption rate (OCR) was employed as an indi- 
cator of mitochondrial respiration and energy production. Mutant 
MELAS-iPS1 and MELAS-iPS3 exhibited significantly lower OCR 
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Table 1 | Distribution of mtDNA variants in fibroblast and iPS clones derived from patients with heteroplasmic mutations 


iPS clones Fibroblast clones 
3243 Mutant G (%) 8993 Mutant G (%) 13513 Mutant A (%) 3243 Mutant G (%) 13513 Mutant A (%) 
A>G T>G G>A A>G G>A 
Parental 29 Parental 52 Parental 84 Parental 29 Parental 84 

fibroblasts fibroblasts fibroblasts fibroblasts fibroblasts 
iPS1 100 iPS1 62 iPS1 100 fib1 100 fib1 0 
iPS2 0) iPS2 72 iPS2 2 fib2 100 fib2 68 
iPS3 100 iPS3 32 iPS3 4 fib3 ) fib3 24 
iPS4 0) iPS4 52 iPS4 ) fib4 93 fib4 64 
iPS5 fe) iPS5 29 iPS5 80 fib5 8 fib5 58 
iPS6 33 iPS6 66 iPS6 11 fib6 21 fib6 48 
IPS7 0) iPS7 87 iPS7 19 fib7 3 fib7 69 
iPS8 78 iPS8 72 iPS8 32 fib8 97 fib8 70 
iPS9 88 iPS9 46 iPS9 100 fib9 100 fib9 63 
iPS10 0) iPS10 ) iPS10 V2 fib10 ) fib10 100 


(P<0.05) when compared to the wild-type MELAS-iPS2 (Fig. la). 
Fibroblasts differentiated from MELAS-iPS1 and parental MELAS- 
fib also displayed low levels of mitochondrial function. In contrast, 
these respiratory defects were absent in MELAS-iPS2-derived fibro- 
blasts. In general, mitochondrial respiration correlated with the het- 
eroplasmy levels in cells (Fig. 1b and Table 1). The greater reliance on 
oxidative metabolism in wild-type MELAS-iPS2 was confirmed by the 
elevated OCR to ECAR (extracellular acidification rate) ratio, which 
provides a measure for the relative contribution of oxidative metabol- 
ism versus glycolysis (Extended Data Fig. 3a). Mutant MELAS iPS cells 
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Figure 1 | Mitochondrial respiratory function in MELAS samples. 

a, Oxygen consumption rate (OCR) in undifferentiated MELAS-iPS1, MELAS- 
iPS2 and MELAS-iPS3 cells (n = 9 per cell line, biological replicates) in 
response to 0.5 1g ml * oligomyocin, 1 {tM fluorocarbonyl cyanide phenyl- 


hydrazone (FCCP), 0.5 .M rotenone and 1 1M antimycin. Wild-type MELAS- 
iPS2 displayed higher levels of oxygen consumption when compared to mutant 


and their derivatives displayed significantly decreased OCR/ECAR 
ratios, indicating a greater reliance on glycolysis (Extended Data 
Fig. 3a, b). We next differentiated MELAS iPS cells into neuronal 
progenitor cells (NPCs, Extended Data Fig. 3c, d)!*'*. Diminished 
metabolic profiles in mutant NPCs recapitulated those observed in 
undifferentiated iPS cells (Extended Data Fig. 3e). Cardiomyocyte 
differentiation’ of mutant MELAS iPS cells was severely compro- 
mised due to massive cell death. 

As expected, all iPS cell lines from homoplasmic 8993T>G fibro- 
blasts carried mutant mtDNA (Extended Data Fig. le and Extended 
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MELAS-iPS1 and MELAS-iPS3. b, OCR in MELAS-iPS1 and MELAS-iPS2 
derived fibroblasts and parental MELAS-fib (n = 10 per cell line, biological 
replicates). Error bars are mean = s.e.m. and OCR data are representative of 
at least 2-3 independent experiments. Significance established with one-way 
analysis of variance (ANOVA) with Tukey’s multiple comparison test. 
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Table 2 | Summary of 47 SNPs found in the mtDNA of Leigh-fib and Leigh-NT1 lines 


Nucleotide position Leigh-NT1 Leigh-fib Locus Effects Nucleotide position Leigh-NT1 Leigh-fib Locus Effects 
152 C. I Control region 10400 1 Cc MT-ND3 Syn 
248 A Deletion Control region = 10410 Cc a MT-TR _ 
489 Cc Ti Control region = 10609 T C MT-ND4L Non-syn 

3010 A G MT-RNR2 - 10873 Cc il; MT-ND4 Syn 
3206 T Cc MT-RNR2 - 12406 G A MT-ND5 Non-syn 
3970 Cc T MT-ND1 Syn 12418 Deletion A MT-ND5 Frame shift 
4086 Cc T MT-ND1 Syn 12705 ir Cc MT-ND5 Syn 
4216 T Cc MT-ND1 Non-syn 12882 Cc i: MT-ND5 Syn 
4883 Cc MT-ND2 Syn 13759 G A MT-ND5 Non-syn 
5178 A Cc MT-ND2 Non-syn 13928 G Cc MT-ND5 Non-syn 
6392 T Cc MT-CO1 Syn 14668 T Cc MT-ND6 Syn 
6527 A G MT-CO1 Syn 14783 Cc iT MT-CYB Syn 
6962 G A MT-CO1 Syn 14979 Cc T MT-CYB Non-syn 
7775 A G MT-CO2 Non-syn 15043 A G MT-CYB Syn 
8414 T Cc MT-ATP8 Non-syn 15301 A G MT-CYB Syn 
8473 C ii MT-ATP8 Syn 15676 T Cc MT-CYB Syn 
8507 A G MT-ATP8 Non-syn 16148 Cc vi Control region - 
8701 G A MT-ATP6 Non-syn 16162 A G Control region = 
8993 i G MT-ATP6 Non-syn 16172 T Cc Control region = 
9053 G A MT-ATP6 Non-syn 16223 T Cc Control region - 
9540 C Ti MT-CO3 Syn 16244 G A Control region = 
9548 G A MT-CO3 Syn 16304 T Cc Control region = 

10310 G A MT-ND3 Syn 16362 Cc il: Control region = 

10398 G A MT-ND3 Non-syn 


Syn, synonymous; non-syn, non-synonymous. 


Data Table 1). Therefore, we pursued mitochondrial replacement by 
SCNT with wild-type oocyte mitochondria. Following our reported 
protocol, two stable nuclear transfer-embryonic stem cell lines were 
established (Leigh-NT1 and Leigh-NT2)’°. Genotyping confirmed that 
both lines contained predominantly oocyte wild-type mtDNA 
(Extended Data Fig. 1f) with limited low mutated mtDNA carryover 
(<1%) at passage 5 that became undetectable upon extended propaga- 
tion (Extended Data Table 2). 

Cytogenetic G-banding revealed that Leigh-iPS1 and Leigh-NT1 
retained normal diploid karyotypes with no detectable numerical or 


structural chromosomal abnormalities (Extended Data Fig. 2c). 
However, Leigh-NT2 showed a XXXY tetraploid karyotype 
(Extended Data Fig. 2c). Fingerprinting by short tandem repeat ana- 
lysis (STR) also revealed that Leigh-NT2 contained both oocyte and 
Leigh-fib alleles (Extended Data Table 3), consistent with failed enuc- 
leation. STR profiles for Leigh-NT1 and Leigh-iPS1 were identical to 
Leigh-fib (Extended Data Table 3). Both Leigh-iPS1 and Leigh-NT1 
lines maintained typical PSC morphology, expressed pluripotency 
markers’ and formed teratomas containing cells and tissues from all 
three germ layers (Extended Data Fig. 2d, e). 
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Figure 2 | Restoration of mitochondrial respiratory function in Leigh-NT1. 
a, OCR in Leigh-NT1, Leigh-iPS1 and Leigh-iPS2 derived fibroblasts, parental 
and oocyte donor fibroblasts (n = 9, 8, 10, 9 and 8 per cell line, respectively, 
biological replicates). b, OCR in Leigh-NT land Leigh-iPS1 derived skeletal 
muscle cells (1 = 6 biological replicates per cell line). c, mtDNA haplotype 
distances for oocyte and somatic cell donors based on mitochondrial 
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phylogenetic tree from PhyloTree (http://phylotree.org/tree/main.htm)”. 
Asterisks indicate subgroups for mtDNA haplotypes. Error bars are 

mean + s.e.m. and OCR data are representative of at least 2-3 independent 
experiments. Significance established using Kruskal-Wallis with Dunn’s 
multiple comparison test or one-way ANOVA with Tukey’s multiple 
comparison test (maximum respiration in a) and Student’s t-test in b. 
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Figure 3 | Global gene expression analysis by 
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RNA-seq. a, Heat map displaying 345 genes 
| <0.05) between wild-type PSCs (biological 


-3 


duplicates of 3 independent cell lines) versus mutant 
PSCs (biological duplicates of 5 independent cell 
lines). b, Functional enrichment analysis of genes 
displayed in the heat map that are known to be 
correlated with a response to oxidative stress. 

c, Functional enrichment analysis of genes 


displayed in the heat map that are known to be 
correlated with a response to hypoxia. Bar graphs 
are mean + s.e.m., using all samples described in a. 
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Whole mtDNA sequencing confirmed the presence of homoplas- 
mic 8993T>G mutation in Leigh-fib and Leigh-iPS1 and also iden- 
tified a second homoplasmic 4216T>C mutation in the MT-ND1 
gene (Extended Data Fig. 1f and Supplementary Table 2). This non- 
synonymous mutation has been previously associated with Leber’s 
hereditary optic neuropathy’’. Leigh-NT1 mtDNA sequence differed 
from Leigh-fib at 47 nucleotide positions. In addition to the pathogenic 
8993T>G and 4216T>C mutations, differences included 10 single 
nucleotide polymorphisms (SNPs) in the D-loop region, 2 in the 16S 
rRNA gene, 1 in the tRNA-R gene and 34 in protein genes (Table 2). 
We also detected two heteroplasmic variants in Leigh-NT1, four in 
Leigh-fib and three in Leigh-iPS1 (Supplementary Table 2). Clinical 
symptoms associated with these variants have not been reported. 

We measured metabolic function in fibroblasts differentiated 
from Leigh-NT1, Leigh-iPS1 and Leigh-iPS2 and compared them to 
parental Leigh-fib and healthy skin fibroblasts from the oocyte donor. 
As expected, the homoplasmic 8993T>G mutation resulted in low 
mitochondrial oxidative capacity. In contrast, these respiration defects 
were absent in fibroblasts differentiated from Leigh-NT1 (Fig. 2a). 
Leigh-NT1 also displayed a metabolic profile and OCR/ECAR ratios 
similar to oocyte donor fibroblasts (Fig. 2a and Extended Data Fig. 4a). 
We observed varying levels of oxidative reserve for Leigh-iPS1 
and Leigh-iPS2 compared to parental Leigh-fib (Fig. 2a), reflecting 
inherent variability within differentiated fibroblast populations. 
Both Leigh-NT1 and Leigh-iPS1 effectively generated skeletal muscle 
cells’? (Extended Data Fig. 4b), with Leigh-iPS1 skeletal muscle cells 


g Mean mutant fibroblast 


bhtubl 


m Mean WT fibroblast 


displaying significantly lower ATP turnover (P< 0.05) (Fig. 2b). 
Extensive cell death was observed in Leigh-iPS1 during directed car- 
diomyocyte differentiation (Extended Data Fig. 4c). These results 
demonstrate complete functional rescue of mitochondrial activity in 
Leigh-NT1 through restoration of the wild-type mtDNA. 

Evolution of mtDNA has resulted in a series of neutral polymorphic 
variants within the human population often associated with regional 
migration and adaptation to climate’*. The largest difference between 
distant human mtDNA haplotypes has been estimated at 95 SNPs”. 
In the present study, phylogenetic analysis assigned oocyte and 
Leigh-NT1 mtDNA to the D4a haplotype while the Leigh-iPS1 and 
Leigh-fib mtDNA haplotype was Fla (Supplementary Table 2). 
D4a is a descendant from the M while Fla comes from the N 
macro-haplo-group per the human mtDNA mutation tree (Fig. 2c). 
Safety evaluations of mitochondrial replacement therapy suggest pos- 
sible harmful secondary outcomes reflecting nuclear-mitochondrial 
incompatibility". Despite ‘unmatched’ donor mtDNA, Leigh-NT1 
demonstrated lineage-specific differentiation and restoration of meta- 
bolic activity, implying normal nuclear-mitochondrial interaction. 
We further investigated a hESO-NT1 derived by SCNT from healthy 
fetal fibroblasts (human dermal fibroblast (HDF)) and IVF-derived 
hESO-8 carrying identical mtDNA‘. hESO-NT1 mtDNA differed 
from HDF at 12 nucleotide positions (Fig. 2c and Supplementary 
Table 3). Metabolic profiles in NPCs and cardiomyocytes differen- 
tiated from hESO-NT1 and hESO-8 displayed similar metabolic pro- 
files (Extended Data Fig. 5a—-e). Next, we asked whether the 3243A>G 
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and 8993T>G mutations induced detectable changes in global gene 
expression and compared transcriptomes by RNA-seq for undiffer- 
entiated and differentiated PSCs. Undifferentiated PSCs containing 
wild-type or mutant mtDNA showed 154 differentially expressed tran- 
scripts (adjusted P value <0.05 ). This small number of differences is 
consistent with the predominantly glycolytic metabolism of pluripo- 
tent stem cells, which protects them from the deleterious effects of 
mtDNA mutations. Global gene expression analysis of fibroblasts 
differentiated from isogenic MELAS lines identified 1,118 differenti- 
ally expressed genes in mutant and wild-type cells (Extended Data 
Fig. 6a), whereas 2,950 genes were differentially expressed in fibro- 
blasts differentiated from mutant Leigh-iPS1, Leigh-iPS2 and Leigh- 
iPS3 compared to the wild-type Leigh-NT1 cells (Extended Data 
Fig. 6b). Hierarchical clustering using a multiple bootstrap resampling 
algorithm showed that the Leigh-NT1 fibroblasts were similar to 
hESO-NT1, hESO-NT2, hESO-7 and hESO-8 fibroblasts (Extended 
Data Fig. 6c). These findings further support the notion that oocyte 
mtDNA in Leigh-NT1 interacts normally with nuclear DNA as long as 
the mtDNA sequence differences are neutral. 

Next, we asked whether any of the differentially expressed genes 
were common to both 3243A>G and 8993T>G mutations, and found 
345 genes that were shared, 96% of which were overexpressed in 
the mutant cells (Fig. 3a). Functional enrichment analysis identified 
genes associated with a response to hypoxia and oxidative stress”*** 
(Fig. 3b, c; P value <0.001). However, we did not observe an enrich- 
ment of genes associated with metabolism, stress response, epigenetic 
regulation, and cell signalling, which was reported in a recent MELAS 
study’’ (Extended Data Fig. 6d). 

Finally, we addressed whether the 3243A>G and 8993T>G 
mutations specifically impact gene expression of mtDNA-encoded 
transcripts”. We found that transcripts expressed from mtDNA 
accounted for approximately 20% of the total cellular transcriptome, 
with similar expression levels across different mutations (Extended 
Data Fig. 7; adjusted P value >0.05). 

We demonstrate complementary strategies for generating genetic- 
ally and functionally corrected PSCs for patients with mtDNA disease. 
For the most common mtDNA syndromes caused by heteroplasmic 
mutations, generation of multiple iPS cell lines allows recovery of clones 
with exclusively wild-type mtDNA due to spontaneous segregation of 
heteroplasmic mtDNA. SCNT enables correction of homoplasmic 
mutations through replacement with donor mtDNA, and generation 
of PSCs with transcriptional and epigenetic profiles similar to embryo- 
derived embryonic stem cells”®. Recovery of metabolic function despite 
haplotype differences between patient and donor mtDNA suggests that 
normal nuclear-to-mitochondrial interactions are highly conserved 
within species. Generation of genetically corrected PSCs from patients 
with mtDNA disease enables the transition from palliative care to 
therapeutic interventions based on regenerative medicine. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


The study protocols and informed consent for human subjects were approved 
by the OHSU Embryonic Stem Cell Research Oversight Committee and the 
Institutional Review Board. No statistical methods were used to predetermine 
sample size. 

SCNT and iPS cell derivation and culture. Fibroblasts were acquired from 
Coriell Cell Repositories or donated by patients directly for our study. 
Fibroblasts were cultured in DMEM F12 medium supplemented with 10% 
fetal bovine serum (HyClone) and 501M uridine. SCNT procedures were per- 
formed as described previously'®. Sendai virus-based reprogramming was carried 
out according to the manufacturer’s protocol (CytoTune-iPS Reprogramming Kit, 
Life Technologies). Colonies with typical iPS cell morphology were isolated and 
manually propagated as described previously*® in Knockout DMEM medium 
(Invitrogen) supplemented with 20% knockout serum replacement (Invitrogen), 
0.1mM nonessential amino acids (Invitrogen), 1 mM 1-glutamine (Invitrogen), 
0.1mM f-mercaptoethanol (Sigma), 1X penicillin-streptomycin (Invitrogen) 
and 4ng ml * basic fibroblast growth factor (Sigma). All cell cultures were free 
of mycoplasma contamination. Origin of all cell lines has been authenticated by 
STR and mtDNA genotyping. 

Fibroblast differentiation. Differentiation of PSCs to fibroblasts was induced 
by culture in fibroblast medium (DMEM F12 with 10% FBS) for 2-3 weeks in 
absence of mouse embryonic fibroblast (mEF) feeder layers. Resulting differen- 
tiated cells were FACS sorted for TRA-1-60 (BD Biosciences), SSEA4 (Santa 
Cruz), CD56~ (BD Biosciences) and CD13* (BD Biosciences) cells?®. The CD13* 
cells were further expanded in the fibroblast medium. 

NPC differentiation and culture. For NPC differentiation, a published protocol" 
was followed with minor modifications. PSCs were collected using collagenase IV 
(Life Technologies), washed twice with 1 DPBS without calcium and magnesium 
(Corning Cellgro), and cultured in Neural Induction Medium 1 (NIM-1: 50% 
Advanced DMEM/F12 (Invitrogen), 50% Neurobasal (Invitrogen), 1x B27 
(Invitrogen), 1X N2 (Invitrogen), 2mM GlutaMAX (Invitrogen) supplemented 
with 10ng ml! hLIF (Peprotech), 44M CHIR99021 (Selleckchem), 3 1M 
SB431542 (Selleckchem), 2 1M dorsomorphin (Sigma), and 0.1 4M Compound 
E (EMD Chemicals Inc.)). Cells were cultured in NIM-1 medium for 2 days with 
daily medium change and then switched to Neural Induction Medium 2 (NIM-2: 
50% Advanced DMEM/F12, 50% Neurobasal, 1X N2, 1X B27, 2mM GlutaMAX 
and 10 ng ml! hLIF, 4 1M CHIR99021, 3 1.M SB431542 and 0.1 1M Compound 
E). After 5 days culture in NIM-2 (daily medium change), cells were treated with 
10 4M Y27632 (Selleckchem) for 1 h and ‘dome’-shaped colonies were manually 
picked and treated with Accumax (Innovative Cell Technologies) for 10 min at 
37 °C. Cells were then gently pipetted to obtain single cell suspension and replated 
onto Matrigel-coated 6-well plates at a density 3.5 10° per cm? in Neural 
Progenitor cell Maintenance Medium (NPMM: 50% Advanced DMEM/F12, 
50% Neurobasal, 1x B27, 1X N2, 2mM GlutaMAX, 10ng ml! hLIF, 3 uM 
CHIR99021 and 2 1M SB431542) supplemented with 10 uM Y27632. NPCs were 
maintained on Matrigel-coated dishes in NPMM with daily medium change and 
passaged upon reaching 70% to 80% confluence using Accumax. 

Skeletal muscle differentiation. Skeletal muscle differentiation was based on a 
previous report with minor modifications’’. Briefly, PSCs plated on Matrigel- 
coated plates were grown to 40% confluence in mTeSR1 medium and then 
switched to Skeletal Muscle Induction Medium (SMIM, DMEM/F12, ITS, 3 uM 
CHIR99021). After 4 days culture in SMIM with daily medium change, cells were 
cultured in Skeletal Muscle Expansion Medium (SMEM: DMEM/F12, ITS and 
20 ng ml ' FGF2) for an additional 14 days with daily medium change. Cells were 
then cultured in Skeletal Muscle Differentiation Medium (SMDM: DMEM/F12 
and ITS only) for an additional 18 days. 

Cardiomyocyte differentiation. Cardiomyocyte differentiation was performed 
with adaptation based on the inhibition of GSK3 and Wnt pathways”. Briefly, 
PSCs were collected after Accutase (Life Technologies) treatment and cultured 
on Matrigel-coated plates in RPMI supplemented with B27 without insulin 
(Invitrogen) to 80-90% confluency. Cells were then incubated with 12 1.M 
CHIR99021 (Selleckchem) for 16 h. At day 3, cells were incubated with 5 1M 
IWP2 (Tocris) for 48 h. At day 7, medium was replaced to RPMI supplemented 
with complete B27. Medium was replaced every 3 days. Contracting cardiomyo- 
cytes were observed on day 12 of differentiation. 

Immunocytochemistry. Cultured cells were fixed with 4% paraformaldehyde for 
15 min at room temperature and then permeabilized with 0.2% Triton X-100 in 
PBS for 10 min. Cells were washed 3X with PBST (PBS 1X, 0.02% Tween-20) and 
blocked with 10% goat or donkey serum (Sigma) for 1 h at room temperature. Cells 
were then incubated with primary antibodies diluted in PBST overnight at 4 °C, 
washed 3X with PBST and incubated with secondary antibodies (1:500, Molecular 
Probes) for 1 h at room temperature. Cells were washed 3X and mounted in 
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Prolong Gold Antifade Mountant (Life Technologies). Image acquisition was 
performed on a Zeiss LSM 780 confocal microscope. Primary antibodies were: 
PAX6 (1:100, Convance), NESTIN (1:200, Millipore), MF20 (1:100, DSHB), 
OCT4 (1:100, Santa Cruz) and NANOG (1:40, R&D Systems). 

Teratoma assay. Approximately 3-5 million undifferentiated PSCs were injected 
into the hindleg muscle of 8-week-old, SCID, beige male mice (Charles River) 
using an 18-gauge needle. Six to seven weeks after injection, mice were euthanized 
and tumours were dissected, sectioned and histologically characterized for the 
presence of representative tissues as described previously’. The experiments were 
not randomized, and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

mtDNA heteroplasmy analysis by ARMs-qPCR. The amplification refractory 
mutation system quantitative PCR assay (ARMs-qPCR) was used to measure 
mtDNA carryover in Leigh-NT1 and NT2 as previously described'®. Primers 
and TaqMan MGB probes were designed to detect the 8993T>G mutation site. 
The nondiscriminative (ND) and discriminative (D) assays were mixed and mea- 
sured with Rotor-Gene Multiplex PCR Kit (Qiagen). All reactions were run in 
duplicate with two different amounts of input DNA: 1-4 ng and 1:8 dilutions. The 
SDS software generated a standard curve using four eightfold dilutions plus a final 
fourfold dilution. The percentage of mtDNA carryover in relation to the total 
mtDNA content was calculated by the equation: heteroplasmy = 100X (quantity 
D/quantity ND). ARMs-qPCR was also applied to detect 8993T>G, 3243A>G 
and 13513G>A heteroplasmy levels in fibroblasts and iPS cells using primers and 
TaqMan MGB probes specifically targeting to mutation sites. 

Whole mtDNA sequencing analysis by MiSeq. Single PCR amplification of 
entire human mtDNA was performed with primers mtDNA-F-2120, (GGAC 
ACTAGGAAAAAACCTTGTAGAGAGAG) and mtDNA-R-2119 (AAAGAGC 
TGTTCCTCTTTGGACTAACA) under the following conditions: 94 °C for 1 min 
followed 98 °C for 10 s and 68 °C for 16 min X30 cycles and then 72 °C for 10 min. 
PCR amplifications were performed using TAKARA LA Taq polymerase (Takara 
Biotechnology) and the concentrations of PCR products were measured using a 
Qubit 2.0 Fluorometer. The Nextera XT DNA sample preparation kit (Illumina) 
was used to prepare the libraries. Sequencing was performed on an Illumina MiSeq 
instrument and the data were analysed using NextGENe software. Briefly, 
sequence reads ranging from 100 to 200 bp were quality filtered and processed 
using BLAT algorithm. Sequence error correction feature (condensation) was 
performed to reduce false-positive variants and produce sample consensus 
sequence and variant calls. Alignment without sequence condensation was used 
to calculate percentage of mitochondrial genome with depth of coverage of 1,000. 
Starting from quality FASTQ reads, the reads were quality filtered and converted 
to FASTA format. Filtered reads were then aligned to the revised Cambridge 
Reference Sequence (rCRS) of the human mtDNA (NC_012920.1) followed by 
variant calling. Variant heteroplasmy was calculated by NextGENe software as 
follows: Base heteroplasmy (mutant allele frequency %) = mutant allele (for- 
ward + reverse)/total coverage of all alleles C, G, T, A (forward + reverse) X 100. 
The clinical significance of the variants was then analysed with MitoMaster (http:// 
www.mitomap.org/MITOMASTER/WebHome). 

Live cell oxygen consumption. XF24 or XF96 extracellular flux analysers 
(Seahorse Biosciences) were used to measure oxygen consumption rates (OCR) 
as described’”. In brief, stem-cell-derived fibroblasts were seeded at a density of 
50,000 cells per well of a XF24 cell culture microplate and incubated for 24 h to 
ensure attachment. Before assay, cells were equilibrated for 1 h in unbuffered 
XF assay medium supplemented with 25mM glucose, 1 mM sodium pyruvate, 
2mM glutamax, 1X nonessential amino acids and 1% FBS in a non-CO, incub- 
ator. Mitochondrial processes were examined through sequential injections of 
oligomycin (0.5 1g ml’), carbonyl cyanide 4-(trifluoromethoxy) phenylhydra- 
zone (FCCP, 141M) and rotenone (0.5 11M)/antimycin A (14M). Indices of 
mitochondrial function were calculated as basal respiration rate (baseline 
OCR — rotenone/antimycin A OCR), ATP dependent (basal respiration rate — 
oligomycin OCAR), maximal respiration rate (FCCP OCR — rotenone/antimycin 
A OCR) and oxidative reserve (maximal respiration rate — basal respiration 
rate). For other cell types, an XF96 extracellular flux analyser was used 
with 20,000 cells seeded to each well of a XF96 cell culture microplate. After a 
24-h attachment period, mitochondrial processes were examined using the same 
protocol as above. Each plotted value was normalized to total protein 
quantified using a Bradford protein assay (Bio-rad). Results were presented as 
mean + s.em. One-way ANOVA was used for three group comparisons and 
Student’s t-test was used for two group comparisons. A P value less than 0.05 
was considered significant. 

Flow cytometric analysis. The efficiency of differentiation protocols was assessed 
by FACS. For cardiomyocyte differentiation, 0.25 million cells were fixed in the 
presence of 1% (vol/vol) paraformaldehyde at room temperature for 20 min. Fixed 
cells were then incubated in 90% (vol/vol) cold methanol for 15 min at 4 °C, rinsed 
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two times and incubated overnight at 4°C with a primary antibodies against 
GATA4 (Santa Cruz) and cTnT (Pierce). After staining, cells were rinsed two 
times and incubated in the presence of 1:1,000 secondary antibodies (donkey 
Alexa 488 and 567; Molecular Probes) for 30 min. After staining, cells were washed 
two times and re-suspended for analysis. 

Real time RT-PCR. RNA was isolated using RNeasy kit (Qiagen) as per manu- 
facturer’s instructions. CDNA synthesis was performed using the iScriptTM cDNA 
synthesis kit for RT-PCR (BioRad). Real-time PCR was performed using the SYBR 
Green Supermix (BioRad). The levels of expression of respective genes were nor- 
malized to corresponding 18S values and are shown as fold change relative to the 
value of the control sample. All reactions were done in triplicate. 

RNA-seq library construction and data analysis. RNA was isolated with Micro- 
to-Midi Total RNA Purification System (Life Technologies), quality evaluated 
(RNA6000 Nano Kit and BioAnalyzer 2100, Agilent) made into sequencing 
libraries, sequenced and mapped as previously described’*. Libraries were con- 
structed using 500ng input RNA per sample. Approximately 27 million reads 
were generated per sample, and 73% of these reads were uniquely mapped. Counts 
for each gene were quantified using the python script rpkmforgenes and annotated 
using Ensembl GRCh37. Genes without at least one sample with at least five reads 


were removed from the analysis. The count data was normalized and differential 
expression was performed using the R (v.3.1.1) package DESeq2 (v.1.4.5). 
Briefly, DESeq2 uses negative binomial generalized linear models and shrinkage 
estimation for dispersions and fold changes to improve stability and interpret- 
ability of the estimates”. It reports a P value and an adjusted P value using the 
Benjamini-Hochberg procedure. Genes with an adjusted P value less than 0.05 
were considered differentially expressed unless otherwise noted. Heat maps were 
constructed using the R (v.3.1.1) package gplots (v.2.14.2). Each variable was 
standardized by subtraction of its mean value and division by its standard devi- 
ation across all samples. All functional enrichment analyses were generated 
using the Genomic Regions Enrichment of Annotations Tool (v. 2.0.2)’? with 
default settings. Hierarchical clustering was performed with the R package 
pvclust, with Euclidian distance and average linkage with 10,000 bootstraps. 


28. Paull, D. et a. Nuclear genome transfer in human oocytes eliminates 
mitochondrial DNA variants. Nature 493, 632-637 (2013). 

29. Love, M. I. Huber, W. & Anders, S. Moderated estimation of fold change and 
dispersion for RNA-seq data with DESeq2. Genome Biol. 15, (2014). 

30. McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory 
regions. Nature Biotechnol. 28, 495-501 (2010). 
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Extended Data Figure 1 | mtDNA genotyping by Sanger sequencing. 

a, Chromatographs showing mtDNA genotyping at 3243 position (arrow) 
in representative MELAS iPS cells. b, Chromatographs showing mtDNA 
genotyping at 8993 position (arrow). c, mtDNA at 13513 position (arrow) in 
representative iPS cells derived from Leigh syndrome patients. d, Chromato- 
graphs showing either wild-type A or mutant G allele at position 3243 in 


representative MELAS fibroblast clones. e, mtDNA genotyping demonstrated 
that all Leigh-iPS cell lines and Leigh-fib contain a G mutation allele at 
mtDNA position 8993. f, mtDNA genotyping demonstrated that Leigh-fib and 
Leigh-iPS1 cell lines contained a C mutant allele at position 4216 and a 

G mutant allele at position 8993, while Leigh-NT1 line carried oocyte mtDNA 
with a wild-type T allele at both positions. 
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Extended Data Figure 2 | Cytogenetic, pluripotency and teratoma analyses. 
a, MELAS-iPS1 and MELAS-iPS2 expressing NANOG detected by immuno- 
cytochemistry. Scale bars, 200 jum. b, Histological analyses of teratoma tumours 
produced after injections of MELAS-iPS1 and MELAS-iPS2 cells into SCID 
mice. Scale bars, 200 jum. c, Cytogenetic G-banding analysis confirmed that 


b Ectoderm 


MELAS-iPS1 () ¢ 


Leigh-NT2, 92, XXXY 


ate SHE Hie sate we 


1 2 


Leigh-iPS1, 46, XY 


ee ae | 


VEE oe itis ont aie spay NE Re AA ES ae ts ag 


" 


VAdG ONE NEED BAe we 


7 


Tet as 648 


8 2 
eee BREE anae 4848 ye i as 18 oe 688 3 2 
e Ectoderm Mesoderm Endoderm 
i! 
a 
Leigh-NT1 \Wes 


( és Jn ok 
; #] 


iy : 


Fen 


Leigh-iPS1 & 
g : 


exhibited a XXXY tetraploid karyotype. d, Leigh-NT1 and Leigh-iPS1cells 
expressed OCT4 and NANOG. Scale bars, 200 um. e, Histological analyses of 
teratoma tumours produced after injections of Leigh-NT1 and Leigh-iPS1 
cells into SCID mice. Scale bars, 200 1m. Haematoxylin and eosin staining of 
teratoma sections identify derivatives of ectoderm, mesoderm and endoderm. 


Leigh-NT1 and Leigh-iPS1 exhibited normal 46XY karyotypes and Leigh-NT2 
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Extended Data Figure 3 | Metabolic function in differentiated cells from derived NPCs. Scale bar, 100 jim. d, Quantitative analysis of PSC (OCT4 
MELAS iPS cells. a, OCR/ECAR ratio in MELAS-iPS cells. Mutant MELAS- and NANOG) or NPC (SOX1, NESTIN and PAX6) marker expression in 
iPS1 and MELAS-iPS3 displayed significantly decreased OCR/ECAR ratios MELAS-#PS cells and NPCs (n = 3 per cell line, biological replicates). e, OCR of 
compared to wild-type MELAS-iPS2 (P < 0.05), indicating a greater relianceon | MELAS-iPS cell derived NPCs (n = 6 per cell line, biological replicates). Error 
glycolysis (n = 9 per cell line, biological replicates). b, OCR/ECAR ratio in bars are mean + s.e.m. Significance established with one-way ANOVA with 
MELAS-iPS cell derived fibroblasts (n = 10 per cell line, biological replicates). | Tukey’s multiple comparison test. 

c, Immunofluorescence analysis for neural progenitor markers in MELAS-iPS 
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Extended Data Figure 4 | Metabolic function in differentiated cells from 
Leigh syndrome PSCs. a, OCR/ECAR ratio in Leigh-iPS1, Leigh-iPS2 and 
Leigh-NT1 derived fibroblasts, parental and oocyte donor fibroblasts (n = 9, 8, 
10, 9 and 8 per cell line, respectively, biological replicates). b, Immunofluo- 
rescence analysis of Leigh-iPS1- and Leigh-NT1-derived skeletal muscle cells 
labelled with MF20 and myogenin antibodies. Scale bar, 100 jim. 
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c, Cardiomyocyte differentiation efficiency in Leigh-iPS1 and Leigh-NT1 
evaluated by FACS for CTnT-Alexa 647 and GATA4-FITC antibodies 
(n = 3 per cell line, biological replicates). Error bars are mean + s.e.m. 
Significance established with one-way ANOVA with Tukey’s multiple 
comparison test. 
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Extended Data Figure 5 | Metabolic function in hESO-NT1 and hESO-8. 
a, Immunofluorescence analysis of hAESO-NT1 and hESO-8 derived NPCs with 
nestin and PAX6 antibodies. Scale bar, 100 ttm. b, Metabolic profiles of NPCs 
differentiated from hESO-NT1 and hESO-8 (n = 6 per cell line, biological 
replicates). c, Immunofluorescence analysis of hKESO-NT1 and hESO-8 derived 
cardiomyocytes with troponin I and NKX2.5 antibodies. Scale bar, 100 um. 
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d, Efficiency of cardiomyocyte differentiation in hESO-NT1 and hESO-8 
evaluated by FACS analysis for CInT-Alexa 647 and GATA4-FITC antibodies 
(n = 3 per cell line, biological replicates). e, OCR of hESO-NT1 and hESO-8 
derived cardiomyocytes (n = 6 per cell line, biological replicates). Error bars 
are mean ~ s.e.m. Significance established with Student’s t-test. 
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Extended Data Figure 6 | RNA-seq analyses of fibroblasts differentiated 
from MELAS and Leigh syndrome PSCs carrying wild-type and mutant 
mtDNA. a, Heat map showing all differentially expressed 1,118 genes 
(adjusted P value < 0.05) between fibroblasts differentiated from mutant 
MELAS iPS cells (n = 4 from biological duplicates of MELAS-iPS2 and 
MELAS-iPS4) and wild-type MELAS iPS cells (n = 4 from biological duplicates 
of MELAS-iPS1 and MELAS-iPS3). b, Heat map demonstrating differentially 
expressed 2,950 genes (adjusted P value < 0.05) between fibroblasts derived 
from wild-type Leigh-NT1 (biological duplicates) and mutant Leigh iPS cells 
(n = 6 from biological duplicates of Leigh-iPS1, Leigh-iPS2 and Leigh-iPS3). 
c, Hierarchical clustering using Euclidean distance and average linkage using 
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pvclust, which employs a multiple bootstrap resampling algorithm to 
calculate the approximately unbiased (AU, red) and bootstrap probability 
(BU, green) values for cluster distinctions. Hierarchical clustering showed that 
the Leigh-NT1 fibroblasts were similar to hESO-NT1, hESO-NT2, hESO-7 and 
hESO-8 fibroblasts. d, Mean log, normalized counts + s.e.m. for genes 
previously reported to be differentially expressed in MELAS cytoplasmic hybrid 
clones and involved in metabolic and stress response, signalling pathways 
and epigenetic modifying processes (wild type fibroblast; n = 14 from 
biological duplicates of 7 independent cell lines; mutant fibroblast n = 10 from 
biological duplicates of 5 independent cell lines). 
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Extended Data Figure 7 | RNA-seq analysis of the mitochondrial primary fibroblasts and PSC derived fibroblasts) with wild-type mtDNA 
transcriptome. Circular heat map displaying average expression levels for all 1 = 14, biological duplicates of 7 independent cell lines; PSC mutant 
mitochondrial genes grouped by sample differentiation status and presence or (undifferentiated IVF-ESC, NT-ESC and iPS cells) with mutant mtDNA n = 3; 
absence of a mutation in the mitochondrial genome (Fib mutant (including PSC wild type (undifferentiated IVF-ESC, NT-ESC and iPS cells) with wild- 
primary fibroblasts and PSC derived fibroblasts) with mutant mtDNA n= 14, type mtDNA n = 12). The expression of mtDNA-encoded genes was similar 
biological duplicates of 7 independent cell lines; Fib wild type (including irrespective of 3243A>G or 8993T>G mutations (adjusted P value >0.05). 
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Extended Data Table 1 | Mutation loads in Leigh syndrome iPS cells with homoplasmic mutations 


PSC lines ee menos 
Leigh-iPS1 100 100 
Leigh-iPS2 100 100 
Leigh-iPS3 100 100 
Leigh-iPS4 100 100 
Leigh-iPS5 100 100 
Leigh-NT1 0 0 
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Extended Data Table 2 | Quantitative mutant mtDNA carryover analysis in Leigh-NT1 


Cell line % mutant mtDNA % mutant mtDNA % mutant mtDNA % mutant mtDNA % mutant mtDNA 
P5 (+SD) P15 (+SD) P20 (+SD) P30 (+SD) P40 (+SD) 
Leigh-NT1 0.14+0.06 undetectable undetectable undetectable undetectable 
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Extended Data Table 3 | Short tandem repeat analysis of oocyte donors, Leigh-NT2 and iPS cells from the Leigh syndrome patient 
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Live imaging RNAi screen reveals genes essential 
for meiosis in mammalian oocytes 


Sybille Pfender'*, Vitaliy Kuznetsov'*, Michal Pasternak'*, Thomas Tischer’, Balaji Santhanam! & Melina Schuh! 


During fertilization, an egg and a sperm fuse to form a new 
embryo. Eggs develop from oocytes in a process called meiosis. 
Meiosis in human oocytes is highly error-prone’”, and defective eggs 
are the leading cause of pregnancy loss and several genetic disorders 
such as Down’s syndrome*°. Which genes safeguard accurate 
progression through meiosis is largely unclear. Here we develop 
high-content phenotypic screening methods for the systematic iden- 
tification of mammalian meiotic genes. We targeted 774 genes by 
RNA interference within follicle-enclosed mouse oocytes to block 
protein expression from an early stage of oocyte development 
onwards. We then analysed the function of several genes simulta- 
neously by high-resolution imaging of chromosomes and micro- 
tubules in live oocytes and scored each oocyte quantitatively for 
50 phenotypes, generating a comprehensive resource of meiotic 
gene function. The screen generated an unprecedented annotated 
data set of meiotic progression in 2,241 mammalian oocytes, which 
allowed us to analyse systematically which defects are linked to 
abnormal chromosome segregation during meiosis, identifying 
progression into anaphase with misaligned chromosomes as well 
as defects in spindle organization as risk factors. This study demon- 
strates how high-content screens can be performed in oocytes, and 
allows systematic studies of meiosis in mammals. 

Meiosis is still much more poorly understood than mitosis, 
especially in mammals. Systematic screens have greatly increased 
our understanding of mitosis. However, high-content screens for 


mammalian meiotic genes have so far been precluded by various tech- 
nical challenges. For instance, mammalian oocytes are only available in 
small numbers; genetic screens in mammals are slow; and RNA inter- 
ference (RNAi) in oocytes is inefficient owing to large amounts of 
stored protein. Oocytes accumulate proteins while they grow within 
follicles in the ovary®. Thus, we established a protocol that allowed us to 
block protein expression by RNAi during follicle growth and subse- 
quently to assess gene function by quantitative live imaging (Fig. 1a). 
Briefly, we microinjected short interfering RNAs (siRNAs) into small 
follicle-enclosed oocytes and grew the follicles in vitro, combining 
and modifying previous methods’°. When the oocytes had reached 
their full size, we isolated and labelled them, and imaged meiosis live 
for around 18 h on confocal microscopes using automated imaging 
routines. 

The oocytes grown in vitro resembled those grown in vivo: first, the 
efficiency of nuclear envelope breakdown (NEBD) and polar body 
extrusion, as well as the timing of meiotic progression, were similar 
(Fig. lb-d and Extended Data Fig. 1d, e); second, their transcriptome 
was related (Extended Data Fig. 2a-c and Supplementary Table 1); 
third, they developed into blastocysts with similar efficiency upon 
fertilization (Extended Data Fig. 1f, g)°. 

Follicle culture and microinjection are labour-intensive, precluding 
genome-wide screens. Instead, we preselected 774 target genes that 
were highly expressed in mouse oocytes, while excluding messenger 
RNAs (mRNAs) stored for embryo development.We took advantage 


Figure 1 | RNAi screen in live oocytes. 
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of two microarray data sets’®"', which compare the expression profile 


of oocytes with the profiles of other cell types and preimplantation 
mouse embryos, respectively (Fig. 1f). Only genes that were signifi- 
cantly upregulated in oocytes in both data sets were selected for the 
screen. 

To achieve high throughput, we targeted 12 genes simultaneously 
(Fig. 1a, e). Co-depletion of several genes led to the expected pheno- 
type for genes with known functions. For instance, mixes targeting one 
of the zona pellucida genes (Zp1, Zp2 or Zp3) together with 11 other 
genes prevented formation of the zona’* (Extended Data Fig. 1h, i), and 
mixes targeting spindle assembly checkpoint proteins led to the 
expected earlier onset of anaphase (for example, mix 33P1-2-3-4-5-6 
targeting Bub1 in Supplementary Table 2). 

The targeting of all 774 genes enabled us to make videos of 2,241 
individual oocytes, including 1,210 RNAi-treated and 1,031 control 
oocytes. We scored every oocyte for 41 possible defects and deter- 
mined 5 characteristic meiotic time points as well as the spindle length 
and width in meioses I and II (Supplementary Table 3; scored para- 
meters are described in Extended Data Fig. 3). The frequencies with 
which different defects were observed in RNAi-treated and control 
oocytes are plotted in Fig. 2 and Extended Data Figs 4a-i and 5. To 
identify significant hits, we calculated the z-score of individual mixes 
for different categories. These quantifications resulted in a compre- 
hensive annotated resource of defects (Supplementary Tables 2 and 3 
and Extended Data Fig. 2d). Supplementary Table 2 allows users to 
easily query if siRNA mixes targeting their gene of interest result in 


240 | NATURE | VOL 524 | 13 AUGUST 2015 


Stretching Mis- 
aT defect alignment 
Chromosome misalignment 243 
12 Chromosomes lost in cytoplasm pa #| 
7 Lagging 
J K control chromosomes 
Oo 
a 117 Metaphase | arrest 
£o 
8 : 
ist ; é 
5S 107 Lagging chromosomes 
ov F 
ec Spindle Metaphase | 
22 collapse __arrest 
los) Tt Spindle collapse 
je 
oo 
ae f T T T T T T 1 
1S) 0 5 10 15 20 25 30 35 


c Figure 2 | Defects during meiosis I in siRNA- 
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defects in oocyte meiosis or to identify mixes causing defects in the 
stage of meiosis they are studying. This Supplementary Table also 
includes hyperlinks to the original video files for further assessment. 

Proof-of-principle experiments demonstrated that defects observed 
upon targeting several genes simultaneously could be allocated to 
individual genes by stepwise splitting of siRNAs into smaller pools 
(Fig. 1g and Extended Data Fig. 4j). Several of the identified genes have 
not yet been implicated in mouse oocyte meiosis, demonstrating that 
this screening strategy is suitable to identify new meiotic genes. Hits 
were verified typically three times when the siRNA mixes were split to 
track down the genes that caused the phenotype of interest. To confirm 
the observed defects, siRNAs were microinjected again upon gene 
identification. In addition, specificity was confirmed by microinjection 
of individual siRNAs and rescue experiments as detailed below. 

The screen identified several genes that control meiotic progression, 
including Dusp7, a poorly characterized dual-specificity phosphatase. 
More than 40% of Dusp7-depleted oocytes failed to undergo NEBD 
(Fig. 3a, b and Supplementary Video 1). In the remaining 60%, NEBD 
was significantly delayed (Fig. 3c). NEBD could be rescued by wild- 
type DUSP7 fused with enhanced green fluorescent protein (eGFP- 
DUSP7), but not by the catalytically inactive eGFP-DUSP7 C333S 
mutant (Fig. 3b), indicating that the phosphatase activity of DUSP7 
is essential for NEBD. eGFP-DUSP7 was excluded from the nucleus 
(Fig. 3d), suggesting that it promotes NEBD by dephosphorylating 
cytoplasmic proteins. Together, these data identify Dusp7 as a phos- 
phatase essential for NEBD in oocytes. 
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Figure 3 | Dusp7 and Mastl depletion 
phenotypes. a, Oocytes microinjected with control 
or Dusp7 siRNAs. Chromosomes in magenta. 
Quantification of phenotype in b, c. Scale bar, 

10 um. b, ¢, Efficiency (b) and timing of NEBD (c) 
in oocytes microinjected with Dusp7 siRNAs alone 
or together with mRNA encoding eGFP-DUSP7 
or eGFP-DUSP7 C3335. d, Localization of DUSP7 
during oocyte maturation. Live oocytes expressing 
DUSP7 fused with eGFP (green) and H2B fused 
with monomeric red fluorescent protein (mRFP; 
magenta, chromosomes). Scale bar, 10 jum. 
Representative for 36 oocytes from 5 experiments. 
e, Oocytes microinjected with control or Mastl 
siRNAs. Microtubules in green, chromosomes in 
magenta. Arrows highlight lagging chromosomes. 
Quantification of phenotypes in f-i. Scale bar, 

10 pum. f-i, Oocytes microinjected with different 
Mastl siRNAs alone or together with mRNA 
encoding human eGFP—MASTL were scored 

for formation of pronuclei (f), lagging 
chromosomes (g), and efficiency (h) and timing 
of NEBD (i). Number of oocytes is given next to 
bars. P values were calculated with Fisher’s 

exact (b, g, h) or Student’s t-tests (c, i). Data 

from six (b, c), two (f) or five (g-i) independent 
experiments. The box plots in c and i show median 
(line), mean (small square), 5th, 95th (whiskers) 
and 25th and 75th percentile (boxes). 


Figure 4 | Factors implicated in chromosome 
segregation errors. a, b, The efficiency (a) and 
timing (b) of progression into anaphase in 
control oocytes with aligned and misaligned 
chromosomes. Number of oocytes is given next to 
bars. P value was calculated with Fisher’s exact 
test. Data from 52 independent experiments. 

c, Defects significantly more likely to occur in 
oocytes with lagging chromosomes. Significance 
was calculated with Fisher’s exact test by 
comparing the prevalence of other defects in 
oocytes with and without lagging chromosomes, 
and is specified by asterisks next to arrows, with 
****D < 0.0001; ***P < 0.001; **P < 0.01; 

*P < 0.05. The circle area reflects the percentage of 
oocytes with lagging chromosomes in which each 
defect was observed. 
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Another gene essential for meiotic progression was Eif4enif1. 
Mutations in Eif4enif1 have recently been detected in a family with 
premature ovarian failure’’, but the mechanism by which Eif4enif1 
affects fertility is unclear. Our results show that Eif4enif1 is essential 
for NEBD and resumption of meiosis (Extended Data Fig. 6a, b). 

The screen also provided insights into causes of chromosome 
segregation errors in oocytes. Several genes were essential for accurate 
chromosome segregation, including the uncharacterized genes Fam46b 
and Fam46c (family with sequence similarity 46), Aspm'* (Extended 
Data Fig. 7 and Supplementary Video 2), Birc5 (Survivin)'* (Extended 
Data Figs 6), Ttk’® and Mastl (Fig. 3e, g). MASTL was also required to 
prevent exit from meiosis after anaphase I (Fig. 3e, f), but dispensable 
for meiotic resumption, progression into anaphase, chromosome con- 
densation or cytokinesis (Fig. 3h, iand Extended Data Fig. 8), consistent 
with a recent study”. 

The screen also allowed us to analyse on a global level how chro- 
mosome segregation errors arise in oocytes. With data from 2,241 
oocytes, it generated the largest existing data set , to our knowledge, 
of meiosis in mammalian oocytes (Supplementary Table 2). Evalua- 
tion of the control data set identified progression into anaphase with 
misaligned chromosomes as a major contributor to chromosome 
segregation errors: misaligned chromosomes only delayed but did 
not prevent progression into anaphase (Fig. 4a, b). This is consistent 
with the model that the spindle assembly checkpoint in mammalian 
oocytes is less stringent than in mitosis**. 

We were also able to analyse systematically which defects in the 
oocyte precede chromosomes that lag behind during anaphase. This 
is of particular interest because lagging chromosomes can lead to 
inappropriate partitioning of chromosomes upon cytokinesis and 
are a major cause of aneuploidy’*’. We identified chromosome align- 
ment, individualization and stretching as well as spindle defects as risk 
factors (Fig. 4c). A systematic representation of how different defects 
in oocytes were linked is shown in Extended Data Figs 9 and 10. 

In summary, we have established an experimental system that now 
allows systematic studies of meiosis in mammals. The screening 
approach is scalable and could be adapted to investigate fertilization 
or embryo development. The follicle-based RNAi method will also be a 
powerful tool for individual gene studies, as it allows proteins with low 
turnover to be depleted in oocytes and pre-implantation embryos. The 
techniques presented in this study should thus facilitate a more rapid 
accumulation of knowledge about meiosis and early embryo develop- 
ment in mammals, which is crucial to improve methods for treating 
fertility problems in humans. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Preparation, microinjection and culture of follicles. All mice were maintained 
in a specific pathogen-free environment according to UK Home Office regula- 
tions. Ovaries were dissected from two to five 10- to 12-day-old (C57BL x CBA) 
F, females. To obtain individual follicles, the ovaries were incubated in modified 
MEM-« (Gibco 12000-014) medium optimized for in vitro culture of follicles 
supplemented with 0.026 M NaHCO; (Sigma), 5678 U 100ml”! penicillin G 
(Sigma) and 8265 U 100 ml! streptomycin (Sigma), 1 insulin/transferrin/sel- 
enium solution (ITS; Sigma; stock was 100), 5% fetal bovine serum (FBS; Gibco 
16000044) and 0.01pgml~‘ follicle stimulating hormone (FSH; National 
Hormone and Peptide Program, NDDK-oFSH-20) that was supplemented with 
2mgml ' collagenase (Roche) for about 30-40 min total. During incubation with 
collagenase, the ovaries were pipetted up and down every 10 min to facilitate 
dissociation and then washed through several droplets of follicle culture medium 
without collagenase. The follicles where then randomly allocated into control 
siRNA and RNAi mix injection groups. Intact follicles were then loaded into a 
microinjection chamber prepared with two double stick tapes as spacer and 
microinjected as previously described’ in culture medium supplemented with 
HEPES (Sigma). Upon microinjection, follicles were cultured at 37°C in 5% 
CO, on membrane inserts in 6- or 12-well culture dishes filled with follicle 
culture medium (see above). For most experiments, collagen-coated inserts from 
Corning were used (Transwell COL), but also Transwell-Clear inserts that were 
coated with 10,gcm™* collagen solution type I from rat tail (Sigma), BD 
Matrigel Basement Membrane (BD Biosciences; thin coating method) as well 
as BD BioCoat filters were successfully used (Extended Data Fig. 1c). Medium 
surrounding the filter was replaced with fresh medium every 3-4 days. Oocytes 
were isolated from follicles after 10-11 days of in vitro culture. To this end, 
the oocytes were stripped with a small glass pipette and released into modified 
M2 medium that contained 10% FBS instead of BSA as well as 100 ngml' 
FSH and dbcAMP. Oocytes were subsequently microinjected with mRNAs 
encoding eGFP-o-tubulin (spindle) and H2B-mRFP (chromosomes). Upon 
microinjection, oocytes were cultured for up to 3.5 h at 37°C until fluorescent 
proteins were expressed. Oocytes were then released from prophase arrest by 
transferring them into medium without dbcAMP. In vivo grown control oocytes 
from 5-week-old (C57BL X CBA) F, females were obtained by puncturing iso- 
lated ovaries with hypodermic needles and then microinjected with mRNAs as 
described above. 

In vitro fertilization. Oocytes grown for 10-11 days within follicles in vitro or 
obtained from adult (C57BL X CBA) F, females (7-12 weeks old) were denuded 
and matured in follicle-culture medium (see above). Meiosis II oocytes were placed 
in 50 pil of EmbryoMax HTF medium (Millipore) and fertilized with 10 ul of sperm 
suspension from 5- to 13-week-old (C57BL X CBA) F; males. The sperm suspen- 
sion was prepared by dissecting two cauda epididymus from one male in 2 ml of 
HTF medium. After 4-6 h, zygotes were transferred to KSOM+AA (Millipore) 
and cultured for 5 days at 37 °C. 

Expression constructs and mRNA synthesis. To generate constructs for in vitro 
mRNA synthesis, the previously published protein coding sequences of Mastl’° 
and a-tubulin”’ were fused with eGFP and inserted into pGEMHE for in vitro 
transcription. The Mus musculus Dusp7 ORF (derived from NM_153459) was 
amplified by PCR from mouse oocyte cDNA. The resulting product of around 
1,300 base pairs contained a 5'-Xhol and a 3’-EcoRI restriction site, which were 
used to insert it into pGEMHE-eGFP carboxy terminally of the eGFP tag. These 
constructs, as well as pGEMHE-H2B-mRFP1”, pGEMHE-eGFP-MAP4” and 
pGEMHE-eGFP-LaminB1”, were linearized with AscI. Capped mRNA was syn- 
thesized using T7 polymerase (mMessage mMachine kit, following the manufac- 
turer’s instructions, Ambion) and dissolved in 11 pl water. mRNA concentrations 
were determined on ethidium bromide agarose gels by comparison with an RNA 
standard (Ambion). 

Confocal microscopy. Images were acquired with a Zeiss LSM710 confocal 
microscope equipped with a Zeiss environmental incubator box or a Zeiss 
LSM780 confocal microscope equipped with a Tokai Hit Stage Top Incubator, 
with a C-Apochromat 40X/1.2 W water immersion objective lens for live oocytes, 
and a C-Apochromat 63X/1.2 W water immersion objective for fixed oocytes as 
previously described”’. In some images, shot noise was reduced with a Gaussian 
filter. The z-projections were generated in Zen (Zeiss). 

Measurement of chromosome volumes. Oocytes were fixed for 30 min at 
37°C in 100 mM HEPES (pH 7) (titrated with KOH), 50 mM EGTA (pH 7) 
(titrated with KOH), 10 mM MgSO,, 2% formaldehyde (MeOH free) and 0.2% 
Triton X-100. DNA was stained with 0.05 1gml~' Hoechst 33342 (Molecular 
Probes). All stainings were performed in PBS, 0.1% Triton X-100, 3% BSA. 
Chromosome volumes were determined in three-dimensional volume reconstruc- 
tions using the surface function in Imaris (Bitplane). 
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All siRNAs were purchased from QIAGEN. siRNAs were diluted in 96-well 
plates to a concentration of 6.6 1.M and stored at — 80 °C. To pre-select genes, we 
took advantage of two microarray data sets'®"’, which compare the expression 
profile of oocytes with the profiles of other cell types and preimplantation mouse 
embryos, respectively. Only genes that were highly significantly upregulated in 
oocytes in both data sets were selected for the screen, independently of whether 
they had previously been implicated in mitosis or meiosis to avoid any bias. We 
targeted each gene with a low complexity siRNA pool (three siRNAs per gene) 
(Supplementary Table 4), which on average leads to fewer off-target effects and a 
higher penetrance of phenotypes than individual siRNAs**’*. For the primary 
RNAi screen, siRNAs targeting different genes were mixed and microinjected to 
a final concentration of 5 nM each in the oocyte. For the functional characteriza- 
tion of individual genes, siRNA concentrations of up to 0.2 [1M final in the oocyte 
were used. Protein ablation should always be assessed by secondary assays, because 
proteins that are generated in the very early stages of meiosis may still not 
be efficiently depleted if they do not turn over, even if the targeted transcript 
is reduced. 

Quantitative real-time PCR. mRNA was extracted using an RNeasy Mini Kit 
(Qiagen) and cDNA was generated using a High Capacity RNA-to-cDNA Kit 
(Applied Biosystems). Real-time PCR was performed with the 7900 HT Real- 
Time Fast PCR System (Applied Biosystems) using SYBR Green. GAPDH 
mRNA was used for normalization. 

RNA sequencing. Total RNA was isolated using NucleoSpin RNA XS (Macherey- 
Nagel) from oocytes grown in vitro after 10 days of follicle culture or from oocytes 
obtained directly from adult (C57BL X CBA) F, females (7-11 weeks old). A total 
of 50 oocytes with an intact nucleus per sample were used and three samples per 
group were collected. RNA was extracted using NucleoSpin RNA XS (Macherey- 
Nagel). A cDNA library was prepared using SMARTer UltraLow Input RNA for 
Sequencing (Clontech Laboratories) and the samples were processed by BGI Tech 
Solutions. The cDNA product was synthesized and amplified using a SMARTer 
PCR cDNA Synthesis Kit (Clontech Laboratories) from the total RNA (10 ng) of 
each sample. The cDNA was fragmented by Covaris E210 and the median insert 
length was about 200 base pairs. The paired-end cDNA library was prepared in 
accordance with Illumina’s protocols with an insert size of 200 base pairs and 
sequenced for 100 base pairs by HiSeq2000 (Illumina). 

Expression analysis. NOISeq. RNA-Seq based measurements of transcript abun- 
dances at the level of genes were represented by fragments per kilobase of tran- 
script per million fragments mapped (FPKMs). FPKM is conceptually similar to 
the reads per kilobase per million reads sequenced (RPKM) measure, but it is easily 
adaptable for sequencing data from one to higher numbers of reads from single 
source molecules. To identify significantly differentially expressed genes between 
oocytes grown in vitro and in vivo, we used a non-parametric method encoded in 
NOISeq. For this, we first filtered for low count or abundance using the ‘CPM’ low 
count filter of NOISeq. This yielded a reduction from the original 16,343 genes to 
11,470 genes. Significant differential expression between oocytes grown in vivo 
and in vitro was determined using NOJSeq with the following parameters: 
(1) ‘tmm’, trimmed mean of log, FPKM, normalization; (2) biological replicates 
data; and (3) probability of differential expression q being set to 0.8 or above and 
log, values being greater than or equal to —1 or 1 for upregulated and down- 
regulated genes respectively in the in vitro group. This yielded 146 upregulated and 
67 downregulated genes in oocytes grown in vitro. The vast majority of genes 
(11,110) were unchanged between the two conditions. 

DESeq2. RNA-seq counts were considered with two ‘conditions’, namely in vivo 
and in vitro with three replicates. The standard protocol for DESeq2 differential 
expression analyses was followed with default settings. We deemed genes to be 
upregulated or downregulated if log, values were greater than or equal to —1 or 1 
for upregulated and downregulated genes respectively in the in vitro group, with a 
false discovery rate (FDR or padj of DESEq2) less than 0.01 or 1%. We considered a 
particularly low value of false discovery rate because of overall low expression 
levels for transcripts. Hence, we used a more stringent value for the false discovery 
rate. This yielded 282 upregulated and 163 downregulated genes in vitro. 
Statistics. Mean, s.d. and statistical significance based on Student’s t-test or 
Fisher’s exact test (two-tailed) were calculated in Microsoft Excel, assuming nor- 
mal distribution and similar variance. No statistical methods were used to pre- 
determine sample size. The experiments were not randomized. The investigators 
were not blinded to allocation during experiments and outcome assessment. All 
error bars show s.d. All box plots show median (line), mean (small square), 5th, 
95th (whiskers) and 25th and 75th percentile (boxes). The z-scores were calculated 
as the deviation of the mean ofa single siRNA mix to the mean ofall controls of the 
RNAi screen, normalized to the s.d. of all controls. siRNA mixes were sorted 
according to their z-score. The dashed line in Fig. 2 and Extended Data Fig. 4 
delineates mixes with a z-score higher than two s.d. above the average value of 
all controls. 
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Data analysis. Phenotypes were evaluated manually by browsing the data in Zen 
(Zeiss). Defects and measurements (time points and spindle parameters) were 
then recorded on a homemade user interface in OriginPro 8.0 and processed in 
Microsoft Excel. Averages, s.d. and statistical significance were calculated in Excel. 
The z-scores were calculated as the deviation of the mean of a single mix to the 
overall mean of all controls, normalized to the s.d. of all controls. Oocytes that died 
during imaging were not analysed and do not contribute to the data set. 

For Fig. 4c, we analysed data from all 2,241 oocytes, because lagging chromo- 
somes are not very common in control oocytes, but are likely to be triggered by 
various defects such as those induced by RNAi in the screen. 

For the Jaccard index heatmap in Extended Data Fig. 9, RNAi screen pheno- 
types from both mix and control experiments were collected; wherever there were 
numerical values, they were converted appropriately into ‘yes’ and ‘no’ values 
based on the mean and s.d. of the distribution of numerical values. Further, 
‘yes’ values were categorized into ‘+’ and ‘—’ groups based on whether a numerical 
entry was greater than mean + s.d. or smaller than mean - s.d. This information 
was converted to a network representation such that there were two types of node 
in the network oocytes and phenotypes (Extended Data Figure 9c). An edge was 
made between oocyte and phenotype if a given oocyte scored ‘yes’ for a given 
phenotype. This yielded a network that we termed phenotype-oocyte network, 
which included 5,203 edges (or associations) between 53 phenotypes and 1,504 
oocytes. The distribution of the number of oocytes against the number of distinct 
phenotypes scored in them suggested that over 75% of oocytes, namely 1,195, have 
two or more phenotypes scored, suggesting that there were widespread multiple 
phenotypes scored for in the vast majority of oocytes, as expected. Hence, we 
sought to estimate the extent of co-occurring phenotypes across oocytes as a first 
step towards phenotype correlations. We calculated the Jaccard index between all 
possible pairs of phenotypes in the phenotype-oocyte network. The Jaccard index 
between phenotype i and phenotype j was defined as 


oocytes exhibiting phenotype i () oocytes exhibiting phenotype j 
oocytes exhibiting phenotype i LU oocytes exhibiting phenotype j 


where M denotes ‘intersection’ between sets of oocytes with phenotypes i,j and U 
denotes ‘union’ between sets of oocytes with phenotypes i,j. 

The above formula for the Jaccard index captures the fraction of co-occurrence 
of phenotypes i and j in oocytes over the total observed number of instances of 
phenotypes i or j. The numerator denotes the number of oocytes in which pheno- 
types i and j were observed, while the denominator indicates the total number of 
oocytes in which either phenotypes i or j have been observed. The values of a Jaccard 
index range between 0 and 1. Zero signifies poor co-occurrence while ‘1’ signifies 


high co-occurrence. We calculated the Jaccard index for all possible pairs of phe- 
notypes in the phenotype-oocyte network. Out of a possible 1,378 (53C2), we could 
obtain 844 pairs that displayed a Jaccard index greater than zero. We then clustered 
the profile of the Jaccard index between phenotypes represented as a matrix or table. 
For this purpose, we used pheatmap (http://cran.r-project.org/web/packages/pheat- 
map/index.html) with ‘mean’ clustering and ‘Pearson correlation’ options. In this 
way, we obtained three major clusters of phenotype correlations. 

Measurement of oocyte diameter, spindle length and spindle width. Oocyte 
diameter, spindle length and width in metaphase I and metaphase II were measured 
using the Measurement function in Zen (Zeiss). To measure the oocyte diameter 
accurately, measurements were always taken in the centre of the oocyte as deter- 
mined by the maximum radius of the oocyte. Spindle length and width were only 
measured in oocytes in which the spindle was parallel to the confocal imaging plane. 
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Extended Data Figure 1 | Efficiency of follicle growth and comparison of 
oocytes grown in vitro and in vivo. a, Follicles before (top panel) and 

after (bottom panel) in vitro culture. The perimeters of oocyte and granulosa 
cells are highlighted on the right. The follicle diameter increases from 

103.4 + 11.3 um to 314.1 + 104.0 1m during in vitro culture. This lies between 
the diameter of in vivo grown early antral (~248 jum) and Graafian (~424 tm) 
mouse follicles”*. The diameter of n follicles was measured before and after 

in vitro culture and is displayed as mean + s.d. Measurements from three or 
two independent experiments for before and after culture, respectively. 

b, Diameter of oocytes grown in vivo or in vitro. Data from two and seven 
experiments, respectively. The box plot shows median (line), mean (small 
square), 5th, 95th (whiskers) and 25th and 75th percentile (boxes). c, Efficiency 
of follicle growth on different culture substrates. The numbers of independent 
experiments are 343, 56, 11 and 3 from left to right. The total number of 
follicles is specified above the bars. Error bars, s.d. d, Live oocyte expressing 
eGFP-MAP4 (green, microtubules) and H2B-mRFP (magenta, 
chromosomes). The characteristic time points of oocyte maturation that were 
determined for each oocyte in the screen (2,241 oocytes in total from 70 
experiments) are listed above the representative images. Quantification of 
timing in e. Scale bar, 10 um. e, The timing of bipolar spindle assembly, 
chromosome alignment during meiosis I, anaphase, polar body extrusion and 
chromosome alignment during meiosis II were quantified in oocytes obtained 
from 5-week-old (C57BL X CBA) F, females or in oocytes from the same 


strain grown in vitro within follicles. Data from four independent experiments. 
Error bars, s.d. f, Transmitted light images of blastocysts derived from fertilized 
(C57BL X CBA) F, oocytes grown in vitro within follicles (left) or in vivo 
(right). Scale bar, 20 um. Quantifications in g. g, (C57BL X CBA) F, oocytes 
grown in vitro within follicles or in vivo were denuded, matured in vitro and 
fertilized. The percentages of all oocytes (fertilized and unfertilized) that 
developed into two-cell embryos (two-cell from total) and two-cell embryos 
that developed into blastocysts (blastocyst from total) were quantified. 
Developmental rates are consistent with previous studies, in which in vitro 
matured denuded oocytes were fertilized’””*; 179 oocytes grown in vivo and 180 
oocytes grown in vitro were analysed in total. Data from three independent 
experiments for each group. Error bars, s.d. h, Transmitted light images of 
control oocytes and oocytes microinjected with an siRNA mix targeting Zp3 
together with 11 other genes (RNAi Mix against Zp3) or an siRNA mix 
microinjected at the same time that targeted 12 other genes (RNAi mix against 
other genes). Highlighted region is magnified below. Scale bar, 10 jum. 
Quantification of phenotype in i. i, The presence of the zona pellucida was 
scored in oocytes microinjected with control siRNA (control), an siRNA mix 
targeting one of the three Zp genes (Zp1, Zp2 or Zp3) together with 11 other 
genes and an siRNA mix microinjected at the same time that targeted 12 
different genes (RNAi mix against other genes). The number of analysed 
oocytes is given next to bars. 
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Extended Data Figure 2 | Transcriptome analysis of oocytes grown in vivo DESeq2 results, presented as Venn diagrams. There is at least over 80% overlap 
and in vitro. a—c, Transcriptome analysis of oocytes grown in vitro and in vivo. in genes in either upregulated or downregulated groups for both NOISeq 


a, Differentially expressed genes in oocytes grown in vitro based on and DESeq2. d, Qualitative network of phenotypes in oocytes microinjected 
evaluation using NOJSeq algorithm. Transcript abundances are reported in with siRNA mixes. Blue nodes represent siRNA mixes, purple nodes represent 
transcript FPKM. Only about 2% (213 out of 11,470) of genes were phenotypes. Grey lines between mixes and phenotypes denote if at least one 


differentially expressed. b, Differentially expressed genes in oocytes cultured —_ oocyte microinjected with a given mix displayed the phenotype. The clusters 
in vitro based on evaluation using DESeq2 algorithm. Only about 4% (445 genes _ indicate a close relationship between a set of phenotypes and mixes. The 

out of 10,597) of genes were differentially expressed after applying filters in clusters were obtained using ClusterViz (https://code.google.com/p/clusterviz- 
both b and c. The blue lines indicate genes with at least twofold change in cytoscape/) of Cytoscape, which encodes the MCODE method to identify 
expression. Red colour indicates differentially expressed genes with the denoted _ clusters of closed related nodes based on the topology of the network. The 
probability. For details, see Methods. c, The overlap between NOISeq and network contains six clusters identified by ClusterViz. 
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Types of Defects | 
General Morphological 4 : Anaphase/ Spindle/ Chromosome 
Defects | Shia EE SEN | | Division Defects | | Defects in MIl 
4 
vy » - 3 a) : -) 
Prophase Spindle Spindle Polar Body Metaphase II 
Arrest Assembly Relocation Extrusion Arrest 
b Type of Defect Defect Description of Defect 
General Morphological |More Pigment Granules Pigment granules were defined as dark spots within the oocyte’s 
Defects Pigment Granules only at Cortex cytoplasm and qualitatively analysed before nuclear envelope 
Pigment Granules only in Centre breakdown. 
Static Cytoplasm Assessed by qualitatively analysing pigment granule movement 
within the cytoplasm before nuclear envelope breakdown. 
Eccentric Nucleus Nuclei were only classified as eccentric if they were positioned 
within 10 um from the cortex 
Non-surrounded Nucleolar Chromatin Oocytes without a perinucleolar heterochromatin rim 
Configuration 
No Nuclear Envelope Breakdown Oocytes that stayed arrested in prophase for at least 18 hours after 
release from prophase arrest 
Spindle Defects [No Spindle 
Metaphase II 
No Ball Stage Oocytes that directly formed a bipolar spindle after nuclear envelope 
breakdown or failed to form a proper microtubule ball with the 
chromosomes arranged on the ball's surface 
Arrest in Ball Stage Oocytes that showed a microtubule ball with chromosomes 
arranged in a belt-like structure for at least 18 hours after nuclear 
envelope breakdown 
Spindle Pole Defects Comprises multi- and monopolar spindles, and spindles with 
altered spindle pole morphology 
- Multipolar Spindle Spindles with more than two focussed poles 
- Monopolar Spindle Spindles with only one focussed pole 
- Altered Spindle Pole Morphology|Broad or hyperfocussed spindle poles 
Spindle Relocation Defect (only Ml 
Spindle Collapse (only Ml) MI spindle microtubules that completely depolymerized before 
anaphase, sometimes reforming a spindle, sometimes not 
categories 
Chromosome Defects |Chromosome Aggregates 
in Metaphase land {Chromosome Stretching Defect 
Metaphase II Chromosome Misalignment Chromosomes that were not aligned at the spindle equator 
Chromosome(s) Lost in Cytoplasm Chromosomes that were unconnected to the spindle or separated 
from the majority of chromosomes if a spindle was missing 
Anaphase Defects |Metaphase | Arrest Oocytes that showed a bipolar spindle with aligned chromosomes 
and failed to progress into anaphase for at least 18 hours after 
nuclear envelope breakdown 
was slower than that of the majority of chromosomes 
body extrusion 
Division Defects Cytokinesis Defects Comprises polar body retractions, symmetric division failures and 
unsuccessful cytokinesis 
- Polar Body Retraction Half of the chromosomes were extruded in a protrusion resembling 
a polar body that subsequently retracted without cell division into 
polar body and egg 
- Symmetric Division Failure Unsuccessful cytokinesis with furrow ingression in oocyte’s centre 
Symmetric Division Division into two cells of equal size 
Random Cortical Contractions Simultaneous ingression of multiple furrows around the cortex 
during polar body extrusion 
Joined Spindle (only Mil) Following chromosome segregation in anaphase, chromosomes 
fuse again and arrange in a joined spindle 


c Type of 5 a 5 
Numerical Value Description of Numerical Value 
Numerical Value 
Meiotic Time Points |Nuclear Envelope Breakdown Time of nuclear envelope breakdown as evident from 
disappearance of nucleolus 


Bipolar Spindle Formation First time a clear spindle axis is visible 


Spindle Parameters 


Spindle width (MI/MIl) Diameter of spindle in region of metaphase plate 
Extended Data Figure 3 | Description of defects scored in screen. a,Scheme _ description of each defect. c, Table listing the numerical values that were 
illustrating the main categories of defects that were quantified in the screen. measured in the screen and a description of each numerical value. 


b, Table listing the main categories of defects and their subcategories as well as a 
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= Genes Category Defect 
Primary Screen 2" Round 3 Round Identification 
1 Gja4 Follicle Growth No Growth 
1 Zp1 No Zona Pellucida 
1 Zp2 Oocyte No Zona Pellucida 
1 Zp3 Morphology No Zona Pellucida 
Uhrf1 Pigment Granule Enrichment at Cortex 
1 Dusp7 No Nuclear Envelope Breakdown 
1 Eif4enif1 No Nuclear Envelope Breakdown 
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1 - Progression MI Arrest 
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1 Aspm 5 ~ Lagging Chromosomes 
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Extended Data Figure 4 | Defects during meiosis II in siRNA-treated mix to the mean of all controls of the RNAi screen, normalized to the s.d. of 
oocytes. a, d, g, The frequency of cytokinetic defects (a), spindle defects in all controls. siRNA mixes were sorted according to their z-score. The dashed line 


metaphase II (d) and chromosome defects in metaphase II (g) were scoredin _ delineates mixes with a z-score higher than two s.d. above the average value 
siRNA-treated oocytes. The absolute number of oocytes with each defect is given _ ofall controls. j, List of genes that were tracked down to the individual gene level 


next to bars. Data from 70 independent experiments. Corresponding control in the RNAi screen. Note that defects caused by depletion of some proteins 
data are shown in Extended Data Fig. 5. b, e, h, Examples of defects in live such as Zfp420 or Uhrfl may reflect the function of more proximal genes under 
oocytes. Chromosomes (magenta) were labelled with H2B-mRFP, microtubules __ the control of these proteins. We were able to allocate 16 out of 20 tested defects 
(green) with eGFP-«-tubulin. Quantifications in a, d, g. Scale bars, 10 um. to individual genes. Defects that could not be tracked down to individual 


c, f, i, The z-scores were calculated as the deviation of the mean ofa single siRNA _ gene level are shown as grey bars ending after the second or third round. 
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Extended Data Figure 5 | Frequency of meiosis I and meiosis II defects in _ spindle defects in meiosis II (f) and chromosome defects in meiosis II (g) were 
oocytes treated with control siRNAs. a-g, The frequency of scored general _ scored in oocytes microinjected with control siRNAs. The absolute number of 
morphological defects (a), spindle defects in meiosis I (b), chromosome oocytes with each defect is given next to bars. 

defects in meiosis I (c), defects in anaphase I (d), defects during cytokinesis (e), 
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Extended Data Figure 6 | Eif4enif1 is required for release from prophase 
arrest and Birc5 for spindle integrity. a, Live oocytes microinjected with 
control siRNA (control) or siRNAs targeting Eif4enif1 (Eif4enif1 RNAi) 
expressing eGFP-o-tubulin (green, microtubules) and H2B-mRFP (magenta, 
chromosomes) merged with differential interference contrast (DIC) image. 
Region of spindle and chromosomes is magnified without DIC below. 
Quantification of phenotype in b. Scale bar, 10 um. b, Live oocytes 
microinjected with control siRNA or Eif4enif1 siRNAs were monitored by 
long-term time-lapse microscopy as shown in a and the efficiency of NEBD was 
scored. The number of analysed oocytes is specified next to bars. The P value 
was calculated with Fisher’s exact test. Data from a total of three experiments. 
c, Live oocytes microinjected with control siRNA (control) or siRNAs targeting 
Birc5 (Birc5 RNAi) expressing eGFP-o.-tubulin (green, microtubules) and 
H2B-mRFP (magenta, chromosomes) merged with DIC. Region of spindle and 


chromosomes is magnified without DIC below. Quantification of phenotypes 
in d-g. Scale bar, 10 1m. d, Live oocytes microinjected with control siRNA 
(control), a mix of three different Birc5 siRNAs (siRNA 1-3) or two Birc5 
siRNAs individually (siRNA 1, 2) were scored for temporary or permanent 
disintegration of the meiotic spindle. The number of analysed oocytes is 
specified next to bars. The P value was calculated with Fisher’s exact test 
comparing control and all BircS siRNA microinjected oocytes from five 
experiments. e-g, Live oocytes microinjected with control siRNA or Birc5 
siRNAs were monitored by long-term time-lapse microscopy as shown in c and 
the efficiency of NEBD (e), the presence or absence of misaligned chromosomes 
(f) as well as the efficiency of chromosome segregation (g) were scored. The 
number of analysed oocytes is specified next to bars. P values were calculated 
with Fisher’s exact test. Data (d-g) from five independent experiments. 
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Extended Data Figure 7 | Aspm function in mouse oocytes. a, Oocytes anaphase (d), time of anaphase onset (e), polar body extrusion (f) and spindle 
microinjected with siRNAs targeting Aspm or injected with control siRNA. length (g). The number of analysed oocytes is specified next to bars. The P 
Microtubules in green, chromosomes in magenta. Arrows highlight lagging value was calculated with Fisher’s exact test (b, c, d, f) or Student’s t-test 
chromosomes. Quantification of phenotypes in b-g. Scale bar, 10 um. (e, g) comparing control and all Aspm siRNA microinjected oocytes. The 


b, c, Lagging (b) or misaligned chromosomes (c) in oocytes microinjected with box plots in e and g show median (line), mean (small square), 5th, 95th 
different Aspm siRNAs. d-g, Live oocytes microinjected with control siRNA —_ (whiskers) and 25th and 75th percentile (boxes). Data from four independent 
(control) or Aspm siRNAs (Aspm RNAi) were monitored by long-term experiments. 

time-lapse microscopy as shown in a and scored for progression through 
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Extended Data Figure 8 | Mastl is required for metaphase II arrest and 
accurate chromosome segregation, but is dispensable for cytokinesis and 
chromosome condensation in mouse oocytes. a, Live oocytes microinjected 
with control siRNA (control) or siRNAs targeting Mastl (Mastl RNAi) 
expressing eGFP-Lamin B1 (green, nuclear lamina) and H2B-mREP 
(magenta, chromosomes) merged with DIC. Representative of 30 control and 
16 Masti RNAi oocytes. Scale bar, 10 jum. b, c, Live oocytes microinjected with a 
mix of three different Mastl siRNAs expressing human Greatwall fused with 
eGFP (green) and H2B-mRFP (magenta, chromosomes) merged with DIC. 
eGFP-Greatwall localized to the nucleus and was released into the cytoplasm 
shortly before NEBD, consistent with previous studies in mitotic cells””°. 
Representative of 23 oocytes. Quantification in Fig. 3f. Scale bar, 10 pm. 

d-g, Live oocytes microinjected with control siRNA or Mastl siRNAs were 
monitored by long-term time-lapse microscopy and scored for anaphase 
progression (d), time of anaphase onset (e), successful formation or retraction 


LETTER 


of a polar body upon anaphase (f) and the prolonged presence of a midbody 
upon cytokinesis (g). The number of analysed oocytes is specified next to 
bars. Data from five independent experiments. h, Maximum z-projection (left) 
and three-dimensional reconstruction (right) of chromosomes (Hoechst) in 
fixed mouse oocytes microinjected with control siRNAs or siRNAs targeting 
Mastl were generated in Imaris. Quantification in i. i, The chromosome volume 
was quantified in mouse oocytes microinjected with control siRNAs or siRNAs 
targeting Mast! as shown in h in Imaris. The number of analysed oocytes is 
specified next to bars. Data from two independent experiments. j, Masti mRNA 
levels in control oocytes and oocytes microinjected with Mastl siRNAs were 
quantified by real-time PCR. Mean values from two independent experiments. 
P values were calculated with Fisher’s exact test (d, f, g) or Student’s t-test 

(e, i). The box plots in e and j show median (line), mean (small square), 5th, 
95th (whiskers) and 25th and 75th percentile (boxes). 
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Extended Data Figure 9 | Systematic analysis of phenotype correlations in _ oocytes (b) are shown. The ‘red’ and ‘blue’ respectively correspond to high 
mouse oocytes. a, b, Heatmap representation of clusters of phenotypes and low Jaccard indices as indicated by the legend. Clusters of phenotypes were 
generated based on Jaccard indices between them. Jaccard indices, range generated using Pheatmap with ‘Pearson correlation’ values and ‘average’ 
between 0 and 1, were calculated as described in Methods and Extended Data _ clustering input parameters. 

Fig. 10. Jaccard indices calculated from control oocytes (a) and RNAi-treated 
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Network representation 


Jaccard index calculation 
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and clustering 


Extended Data Figure 10 | Network of phenotypes and calculation of 
Jaccard indices. a, Network of phenotype to oocytes was converted into a 
phenotype-phenotype network based on number of oocytes that display two 
phenotypes in question. The network consists of 53 phenotypes and 867 
connections between them. The nodes in the network denote phenotypes and 
edges denote shared oocytes. This is a qualitative network and does not 
consider the strength of connection, edge weight or number of oocytes in which 
a given pair of phenotypes co-occurs. Nodes of identical colours denote a 
cluster (a group of related phenotypes based on topological properties of the 
network). Phenotypes that are not part of any cluster are in the centre and 
indicated by squares (white). Related clusters (if they share phenotypes) are 
marked by dashed circles considered as ‘superclusters’. Clusters were identified 
by the NeMo method in Cytoscape. Network clusters are purely based on 
topological properties and are in agreement with the clusters in the heatmap 
constructed using quantitative measures of Jaccard indices (Extended Data 
Fig. 9a): for example, two superclusters, top left and top right respectively, 
correspond to heatmap clusters at the top left and middle of Extended Data 
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Fig. 9a. b-d, Overview of computational approach with schematics to decipher 
phenotype clusters. Oi, Mi and Ni correspond to oocyte i, mix i and numerical 
value of phenotype i, respectively. b, Conversion of yes, no and numerical 
data depicts the way we converted a combination of ‘yes’, ‘no’ and numerical 
data (denoted by N1, N2, N3 and N4) of phenotypes across oocytes into 
purely ‘yes’ and ‘no’ groups with the ‘yes’ group further classified as ‘yes+’ and 
‘yes—’. c, Reconstruction of the phenotype-oocyte network: we reconstructed 
a phenotype-oocyte network from the above data of ‘yes’ and ‘no’ values by 
considering only the ‘yes’ group. A nonlinear decay relationship between the 
number of phenotypes and number of oocytes in the network is displayed as 
represented by two plots. Details of the plots suggest a median value of 2 for 
phenotypes. d, Network transformation and calculation of Jaccard index matrix 
illustrate our network transformation strategy from a phenotype-oocyte 
network to a phenotype-phenotype network and the simultaneous estimation 
of Jaccard indices between phenotypes. The matrix of Jaccard indices between 
phenotypes was clustered using the pheatmap software in the R package 

with the ‘Pearson correlation’ parameter and the ‘average’ clustering method. 
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The CREB coactivator CRTC2 controls hepatic lipid 
metabolism by regulating SREBP1 


Jinbo Han'*, Erwei Li'*, Liqun Chen!, Yuanyuan Zhang", Fangchao Weil, Jieyuan Liu’, Haiteng Deng” & Yiguo Wang! 


Abnormal accumulation of triglycerides in the liver, caused in part 
by increased de novo lipogenesis, results in non-alcoholic fatty liver 
disease and insulin resistance’. Sterol regulatory element-binding 
protein 1 (SREBP1), an important transcriptional regulator of 
lipogenesis, is synthesized as an inactive precursor that binds to 
the endoplasmic reticulum (ER). In response to insulin signalling, 
SREBP1 is transported from the ER to the Golgi in a COPII- 
dependent manner, processed by proteases in the Golgi, and then 
shuttled to the nucleus to induce lipogenic gene expression**; 
however, the mechanisms underlying enhanced SREBP1 activity 
in insulin-resistant obesity and diabetes remain unclear. Here we 
show in mice that CREB regulated transcription coactivator 2 
(CRTC2)° functions as a mediator of mTOR’ signalling to modu- 
late COPII-dependent SREBP1 processing. CRTC2 competes with 
Sec23A, a subunit of the COPII complex’, to interact with Sec31A, 
another COPII subunit, thus disrupting SREBP1 transport. 
During feeding, mTOR phosphorylates CRTC2 and attenuates 
its inhibitory effect on COPII-dependent SREBP1 maturation. 
As hepatic overexpression of an mTOR-defective CRTC2 mutant 
in obese mice improved the lipogenic program and insulin sens- 
itivity, these results demonstrate how the transcriptional coactiva- 
tor CRTC2 regulates mTOR-mediated lipid homeostasis in the fed 
state and in obesity. 

Insulin resistance, which is associated with hyperglycaemia and 
hypertriglyceridaemia, is the central problem of type 2 diabetes’. 
Although the mechanisms that underlie enhanced glucose and trigly- 
ceride levels remain elusive, hepatic lipogenesis and gluconeogenesis 
are well known to contribute to the paradoxical effects of insulin 
resistance’*”"''. Hepatic lipogenesis is regulated in a combinatorial 
manner by transcription factors including peroxisome proliferator- 
activated receptor gamma (PPARY), liver X receptor (LXR), carbohyd- 
rate response element binding protein (ChREBP) and sterol regulatory 
element-binding protein-lc (SREBP-1c), while gluconeogenesis is 
regulated by forkhead box protein O1 (FOXO1), PPARy coactiva- 
tor-la (PGC-1a), cAMP response element-binding protein (CREB) 
and CREB regulated transcription coactivators (CRTCs) at the tran- 
scriptional level**’°"". CRTCs are extensively studied in glucose home- 
ostasis, whereas previous studies have suggested possible roles of 
CRTCs in lipid homeostasis'**. These findings prompted us to invest- 
igate whether CRTC2 directly affects lipid levels in the liver, where 
CRTC2 is highly expressed. 

We measured hepatic lipid levels in Crtc2*'* and Crtc2-'~ mice. 
Hepatic triglycerides, but not cholesterol, are increased by 50% in 
Crtc2~'~ mice compared to Crtc2*'* mice fed with a regular diet; 
the ratio is even higher under high-fat diet (HFD) feeding conditions 
(Fig. la, b and Extended Data Fig. 1a). Transcript analysis revealed 
specific upregulation of SREBP1-target genes involved in triglyceride 
synthesis in Crtc2-'~ mice fed with both regular and high-fat diets 
(Extended Data Fig. 1b). SREBP1 (also called SREBF1) belongs to the 
family of basic helix-loop-helix-leucine zipper transcription factors 


synthesized as inactive precursors bound to the ER*°. Upon sensing 
insulin stimulation or sterol depletion, SREBP1 is transported to the 
Golgi through COPII-mediated vesicle trafficking, released by a two- 
step proteolytic cleavage, and then shuttled to the nucleus to induce the 
expression of genes involved in cholesterol and fatty acid synthesis*>. 
On the basis of these results, we checked whether CRTC2 modulates 
nuclear SREBP protein levels. In Crtc2-/~ mice fed with both regular 
and high-fat diets, the active, nuclear-localized SREBP1 (nSREBP1) was 
significantly increased while the full-length SREBP1 (f[SREBP1) was 
slightly decreased (Fig. 1c), although total Srebplc mRNA levels 
remained unchanged (Extended Data Fig. 1b), suggesting that CRTC2 
mediates SREBP1 activity at the post-transcriptional level. By contrast, 
Crtc2 deficiency decreased insulin levels and did not modulate the activ- 
ity of SREBP2, a master regulator of cholesterol synthesis*, which is 
consistent with the lack of cholesterol accumulation (Extended Data 
Fig. 1a). In addition, the abnormal accumulation of nSREBP1 and the 
enhanced lipogenic profile in Crtc2”'~ mice were normalized to the level 
found in fed Crtc2*’/* mice by adenovirus-mediated wild-type (WT) 
CRTC2 overexpression or knockdown of Srebp1 (Fig. 1d and Extended 
Data Fig. 2). Taken together, these results suggest that CRT'C2 modulates 
triglyceride synthesis via regulation of SREBP1 maturation. 
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Figure 1 | Accumulation of hepatic lipids and mature SREBP1 in Crtc2~'~ 
mice. a—c, Haematoxylin and eosin sections of liver (a), levels of hepatic 
triglycerides (b) and immunoblots showing hepatic amounts of full-length, 
inactive SREBP1 (fISREBP1) and cleaved, active SREBP1 (nSREBP1) in liver 
extracts (c) from Crtc2*’/* and Crtc2~'~ mice fed with a regular diet (RD) 

or high-fat diet (HFD) for 18 weeks. Scale bar, 50 um. Data are shown as 
mean + s.e.m. *P< 0.01, n = 10. d, Effect of CRTC2 and its mutants 
(CRTC2(ATAD), amino acids 1-630; CRTC2(ATAD/AA), amino acids 1-630 
with double alanine mutations at serine 171 and serine 275) on SREBP1 
maturation and lipogenic protein levels (FASN, SCD1 and ACACA). 
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Shuttling of CRTC2 between the cytoplasm and nucleus is con- 
trolled mainly by the phosphorylation status of serine at sites 171 
and 275 (refs 6, 15). To investigate whether the cellular localization 
of CRTC2 affects SREBP1 maturation, we made two CRTC2 mutants: 
ATAD (amino acids 1-630), which lacks the transactivation domain 
but still shuttles between the cytoplasm and nucleus; and ATAD/AA, 
which is confined to the nucleus because of serine-to-alanine muta- 
tions at positions 171 and 275 (Extended Data Fig. 3a, b). Similar to 
wild-type CRTC2, the ATAD mutant also blocks SREBP1 maturation 
in Crtc2~'~ mice without affecting the levels of pAKT or circulating 
insulin (Fig. 1d and Extended Data Fig. 3c, d), while ATAD/AA was 
not able to suppress the processing of SREBP1, suggesting that 
SREBP1 processing depends on cytosolic but not nuclear CRTC2. 

On the basis of our previous results that CRTC2 is peripherally 
associated with the ER'*’*, we hypothesized that CRTC2 may bind 
to proteins involved in the regulation of SREBP1 transport. Indeed, 
CRTC2 binds to Sec31A, a subunit of the COPII complex’, as iden- 
tified by mass spectrometry and confirmed by co-immunoprecipita- 
tion and co-immunostaining assays (Fig. 2a and Extended Data Fig. 
4a—d). Further analysis showed that Trp143 of CRTC2 is important for 
the CRTC2-Sec31A interaction, since a tryptophan-to-alanine mutant 
(W143A) abolished the interaction between CRTC2 and Sec31A, 
without perturbing its subcellular localization and transcriptional 
activity (Fig. 2b and Extended Data Fig. 4e-g). Conversely, 
CRTC2 associates with a carboxy-terminal polypeptide of Sec31A 
(Fig. 2c). Since the C terminus of Sec31A also mediates its interaction 
with Sec23A (ref. 17), we proposed that CRTC2 may block Sec23A- 
Sec31A binding. Supporting this notion, increased amounts of CRTC2 
attenuate Sec23A binding to Sec31A, and vice versa (Extended 
Data Fig. 4h). However, Sec23A, with higher affinity to Sec31A, 
was more potent at disrupting the CRTC2-Sec31A interaction. 


c Flag-Sec31A 


To establish that CRTC2 and Sec31A interact directly, we performed 
His-tag pull-down assays in vitro. Wild-type CRTC2 effectively atte- 
nuated the Sec23A-Sec31A interaction and bound less tightly to 
Sec31A than Sec23A did, while CRTC2(W143A) neither bound 
Sec31A nor disrupted Sec23A-Sec31A (Fig. 2d). We further per- 
formed an in vitro budding assay to examine whether CRT C2 regulates 
COPII-dependent SREBP1 transport. As expected, the wild-type but 
not the Sec31A-interaction-defective mutant CRTC2 decreased 
SREBPIc budding from the ER (Fig. 2e and Extended Data Fig. 5a), 
which was further confirmed in mice by in vivo testing of nuclear 
SREBP!1 levels and lipogenic gene expression (Fig. 2f and Extended 
Data Fig. 5b, c). Thus, CRTC2 negatively regulates SREBP1 processing 
by competing with Sec23A for binding to Sec31A. 

It is well established that the processing and nuclear activity of 
SREBP1 is induced in response to insulin signalling and nutrient sig- 
nalling**"*. We hypothesized that the CRTC2-Sec31 interaction may 
modulate hormonal and nutritional activation of SREBP1. In fact, both 
insulin and amino acid stimulation attenuated the CRT'C2-Sec31A 
interaction with concomitant enhancement of the Sec23A-Sec31A 
association (Extended Data Fig. 6a, b). As shown in Extended Data 
Fig. 6a, b, the regulatory effect of CRT'C2 on the Sec23-Sec31A inter- 
action was abolished in the presence of torin1, an inhibitor of mTOR 
that controls lipid metabolism and SREBP1 activation”’**". To exam- 
ine whether mTOR directly phosphorylates CRTC2, Sec23A or 
Sec31A and modulates the association of this complex, we identified 
their phosphorylation sites by mass spectrometry analysis (Extended 
Data Fig. 6c and data not shown for Sec23A, Sec31A). The conserved 
serine site at position 136 (Serl36) of CRTC2, which occurs in 
the context of a classic mTOR substrate motif (S/T-P)”, was 
phosphorylated without rapamycin treatment and dephosphorylated 
in the presence of rapamycin (Figs 2b and 3a and Extended Data 
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Figure 2 | CRTC2 attenuates SREBP1 processing by competing with 
Sec23A for binding to Sec31A. a, Co-immunoprecipitation (co-IP) of 
endogenous CRTC2 with Sec31A (top) and co-immunostaining of CRTC2 and 
Sec31A (bottom) in mouse CRL-2189 cells and primary hepatocytes. Scale bars, 
10 pm. DAPI, 4,6-diamidino-2-phenylindole. b, Top: immunoblots showing 
the relative association of Sec31A with wild-type CRTC2 (WT) or CRTC2 with 
a tryptophan-to-alanine mutation at position 143 (W143A) in transfected 
HEK293T cells by co-immunoprecipitation assay. Con, mock transfection. 
Bottom: amino acid sequence alignment of vertebrate CRTC orthologues with 
the conserved Sec31A binding site (W) and mTOR phospho-site (S) boxed. 
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c, Deletion analysis of regions in Sec31A required for the CRTC2-Sec31A 
interaction. Interaction-competent Sec31A polypeptides are indicated by (+) 
in each schematic. d, In vitro pull-down assay showing the binding ability of 
His-tagged full-length CRTC2, CRTC2(W143A) mutant and Sec23A to the C 
terminus of His/HA-Sec31A (amino acids 701-1220). The asterisk shows an 
unspecific protein band. e, f, Effect of wild-type (WT) and Sec31A-binding- 
defective (W143A) CRTC2 on HA-tagged SREBP1c transport from the ER to 
the Golgi in an in vitro budding assay (e) and SREBP1 processing in fed mice (f). 
The constitutive budding protein (ERGIC-53) and ER lumen proteins (GRP94 
and GRP78) are indicated as controls (e). 
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Figure 3 | mTOR phosphorylates CRTC2 and 
promotes COPII-dependent SREBP1 activation. 
a, Identification of CRTC2 Ser136 phosphoryl- 
ation by liquid chromatography mass spectrometry 
(LC-MS/MS) analysis. The MS/MS spectrum of 
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Fig. 6c). In addition, a co-immunoprecipitation assay revealed that 
mTOR interacts with CRTC2, and an in vitro kinase assay showed 
that mTOR directly phosphorylates CRTC2 at Ser136 (Fig. 3b and 
Extended Data Fig. 6d). To confirm the phosphorylation of CRTC2 
at Ser136 in vivo, we raised a polyclonal antibody against the phos- 
pho-CRTC2 (Ser136) peptide and found that both insulin and 
amino acids stimulate CRTC2 Serl136 phosphorylation in an 
mTOR-dependent manner, suggesting that CRTC2 is a bona fide 
substrate of mTOR (Fig. 3c and Extended Data Fig. 6a, b). 
Phosphorylation of CRTC2 Ser136 is much more sensitive to torin1 
than to rapamycin treatment (Extended Data Fig. 6e). Although 
mTOR-defective CRTC2(S136A) had similar cellular localization 
and nuclear activity on Cre-luc to wild-type CRTC2 (Extended 
Data Fig. 6f, g), it reduced the insulin-stimulated enhancement of 
Sec23A-Sec31A association (Fig. 3c). 

Since CRTC2 was able to be phosphorylated in primary hepatocytes 
by mTOR, we next checked whether CRTC2 could be phosphorylated 
via fasting-refeeding transition in mice, thereby modulating the 
Sec23A-Sec31A interaction and SREBP1 activation. Refeeding led to 
CRTC2 phosphorylation and disrupted its association with Sec31A, 
which became available for Sec23A binding, thus enhancing the 
Sec23A-—Sec31A interaction; insulin had a similar but weaker effect 
(Fig. 3d). However, the dynamic regulation of Sec31A-Sec23A by 
CRTC2 was lost in Crtc2~/~ mice (Fig. 3d). In addition, wild-type 
CRTC2 normalized nSREBP1 in Crtc2-‘~ mice during both fasting 
and refeeding to the level in Crtc2*/* mice, and was phosphorylated 
during refeeding (Fig. 3e). The CRTC2(S136A) mutant had a stronger 
inhibitory effect on SREBP1 processing and lipogenic gene expression 
during refeeding owing to deficient phosphorylation by mTOR (Fig. 3e 
and Extended Data Fig. 7a, b). Furthermore, CRTC2(S136A) did not 
affect the outcome of torin1 treatment on SREBP1 activation, hepatic 
triglyceride levels, lipinl expression or cellular localization (Extended 
Data Fig. 7c-e), suggesting that both CRTC2 and lipinl, as mTOR 
downstream mediators, regulate SREBP1 activation at different 
steps'®. Taken together, these results demonstrate that mTOR modu- 
lates COPII-dependent SREBP1 processing via Ser136 phosphoryla- 
tion of CRTC2. 

Considering that lipogenesis is chronically enhanced in obesity and 
diabetes!>!!, we tested whether the mTOR-CRTC2 axis is altered in 
this setting. Consistent with previous studies****, hepatic nuclear 
SREBP1, mTOR activity, and levels of branched-chain amino acids 
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and triglycerides were all increased in HFD-fed, db/db and ob/ob mice 
(Fig. 4a and Extended Data Fig. 8). Meanwhile, the Sec31A-Sec23A 
interaction and the level of phospho-CRTC2(Ser136) were enhanced 
after HFD feeding. Accordingly, the association of CRTC2-Sec31A 
was blocked, although the protein level of COPII subunits remained 
stable (Fig. 4a and Extended Data Fig. 8c). Since the CRTC2(S136A) 
mutant blocks SREBP1 processing, we asked whether this mTOR- 
defective mutant was able to reduce hepatic hypertriglyceride levels 
by inhibiting SREBP1 activation in obesity. To exclude a gluconeo- 
genic effect of CRTC2(S136A), we used an mTOR-defective CRTC2 
mutant without the transactivation domain (CRTC2(ATAD/S136A)) 
to restore SREBP1 processing in HFD-fed mice. CRTC2(ATAD/ 
$136A) did not affect mouse body weight, fat mass, food intake, energy 
expenditure and liver function measured by alanine aminotransferase 
(ALT) and aspartate aminotransferase (AST) activity, but it reduced 
SREBP1 processing, the lipogenesis rate and gluconeogenic capacity, 
as well as improving hepatic steatosis and insulin sensitivity (Fig. 4b-e 
and Extended Data Fig. 9a-f). To investigate further the role of 
CRTC2(ATAD/S136A) in insulin sensitivity, we performed hyper- 
insulinaemic-euglycaemic clamp studies. The steady-state glucose infu- 
sion rate almost doubled during the clamp studies in CRTC2(ATAD/ 
$136A) mice, reflecting enhanced whole-body insulin responsiveness, 
and was accompanied by an increase in the glucose disposal rate 
(Fig. 4f, g). The insulin-stimulated glucose disposal rate, which prim- 
arily reflects skeletal muscle insulin sensitivity, and the insulin- 
induced suppression of plasma free fatty acid levels, which indicates 
white adipose tissue insulin sensitivity, were both slightly increased in 
the presence of hepatic CRTC2(ATAD/S136A), suggesting a possible 
role of inter-organ communication in orchestrating systemic insulin 
sensitivity. However, the insulin sensitivity in liver was markedly 
improved as judged from insulin-induced suppression of hepatic glu- 
cose production and further confirmed by pAKT levels (Fig. 4h and 
Extended Data Fig. 9g-k). 

Previous studies showed that nuclear CRTC2, as a transcriptional 
coactivator, plays an important part in gluconeogenesis during fasting 
or ER stress via its binding to different partners®. Here, our results 
demonstrate that cytosolic CRTC2, as a critical mediator of mTOR, 
modulates COPII activity, which leads to SREBP1 processing and 
enhancement of de novo lipogenesis (Fig. 4i), thereby contributing 
to hepatic lipid levels and insulin resistance along with potential altera- 
tions in free fatty acid supply from either dietary or adipose tissue 
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lipolysis'*. Our findings expand the role of the transcription coactiva- 
tor CRTC2 to include lipid metabolism, and provide insight into how 
SREBP1 activity is enhanced in obesity and diabetes. Considering the 
isoform diversity of COPII subunits and CRTCs*%, as well as the envir- 
onmental cues that activate mTOR’, mTOR-CRTC signalling may 
have important functions in regulating other cargo transport processes 
in different settings. Therefore, it will be interesting to explore the 
selectivity and specificity of this signalling axis in the future. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Mouse strains. Mice were housed in colony cages with a 12 h light/dark cycle ina 
temperature-controlled environment. For high-fat diet feeding experiments, regu- 
lar diet (Research Diets, D12450B) was replaced with a diet containing 60 kcal% fat 
(Research Diets, D12492). Crtc2-‘~ mice have been described previously”. 
Heterozygous Crtc2*’~ mice were backcrossed with C57BL/6] for ten generations 
and then intercrossed to obtain Crtc2-‘~ mice confirmed by PCR-based genotyp- 
ing. All the other mice were purchased from Jackson Labs (Bar Harbour, ME). All 
animal experiments were approved by the Animal Care and Use Committee at 
Tsinghua University. 

Plasmids and adenoviruses. Plasmids containing human Secl3, Sec31A and 
Sec23A were provided by W. E. Balch (The Scripps Research Institute). HA-tagged 
S6K (8984) and Myc-tagged mTOR (1861) plasmids were from Addgene. Myc- 
DDK-tagged lipinl (RC207138) plasmid was purchased from OriGene. 
Adenoviruses (1 X 10° plaque forming units (pfu) GFP, CRTC2, CRTC2(W143A), 
CRTC2(S136A), CRI'C2(ATAD) (amino acids 1-630), CRTC2(ATAD/AA) 
(S171A, S275A), CRTC2(ATAD/S136A), Srebp1 RNAi, or unspecific RNAi) were 
delivered to male C57BL/6J, HFD-fed, Crtc2'’*, or Crtc2~/~ mice by tail vein 
injection. Mice were injected with adenovirus on day 0 and killed on day 7. Wild- 
type CRTC2, Crtc2 RNAi, GFP and unspecific RNAi adenoviruses have been 
described previously’*"®. CRTC2(W143A), CRTC2(S136A), CRTC2(ATAD), 
CRTC2(ATAD/AA) and CRTC2(ATAD/S136A) adenoviruses were constructed 
from mouse Crtc2 cDNA. Srebp1 RNAi adenovirus was constructed using the 
sequence 5'-GGGATCAAAGAGGAGCCAGTGC-3’. All expressed constructs 
used in this study were confirmed by sequencing. 

In vivo analysis and histology. Triglyceride (Sigma, TRO100) and cholesterol 
(BioVision, K603-100) levels in liver and plasma, plasma insulin (Mercodia, 
10-1247-01), plasma alanine aminotransferase (ALT, BioVision, K752-100), 
plasma aspartate aminotransferase (AST, BioVision, K753-100), hepatic 
branched-chain amino acids (BCAA, Sigma, MAKO003) and plasma free fatty acid 
levels (BioVision, K612-100) were measured according to the manufacturer’s 
instructions. Blood glucose values were determined using a LifeScan automatic 
glucometer. Insulin tolerance tests, glucose tolerance tests and pyruvate 
tolerance tests were conducted as previously reported’*. De novo lipogenesis 
experiments were performed as previously reported”. Pieces of liver tissue 
were sent to the Metabolomics Center at Tsinghua University to determine the 
?H,O incorporated palmitate levels by liquid chromatography and mass spectro- 
metry. Hyperinsulinaemic-euglycaemic clamps were performed as previously 
described”. Three days after adenovirus administration, mice were implanted 
with dual jugular catheters for 4 days for use in clamp studies. Food intake and 
energy expenditure were simultaneously measured for individually housed mice 
with a PhenoMaster system (TSE Systems). Relative fat mass was measured with 
an EchoMRI analyser. For histology, mouse tissues were fixed in 4% paraformalde- 
hyde and paraffin embedded. Sections (8 jm) were used for haematoxylin and 
eosin staining. 

Quantitative PCR. Total RNA from mouse liver was extracted using RNeasy 
kits (Qiagen). cDNA was obtained by the iScript cDNA synthesis kit (Bio-Rad). 
RNA levels were measured with the LightCycler 480 II (Roche) as previously 
described’*'*. The following primers were used for qPCR: Acaca-forward 
GGATGACAGGCTTGCAGCTAT, Acaca-reverse TTTGTGCAACTAGGAAC 
GTAAGTCG; Acox1-forward CTGCCAAGGGACTCCAGAGCAGCT, Acox1- 
reverse GACATGGACACATCCACCATGCAG; Actin-forward GTCCACCCC 
GGGGAAGGTGA, Actin-reverse AGGCCTCAGACCTGGGCCATT; Apoa4- 
forward GCCCCATGCCAACAAAGTAA, Apoa4-reverse CCTTGATCGTGG 
TCTGCATG; Chrebp1-forward CTCAGGGTATGCAACCCAGGTG, Chrebp1- 
reverse GACAGGGGTTGTTGTCTCTGGC; Fasn-forward TGGTGGTGTGG 
ACATGGTCACAGA, Fasn-reverse CCGAAGCTGGGGGTCCATTGTGTG; 
Gpat1-forward GCCCTTCGTGGGAAGGTGCTGCTA, Gpatl-reverse CCGTC 
TCGCCAGCCATCCTCTGTG; Mttp-forward GAGCGGCTATACAAGCTCA 
CGTAC, Mttp-reverse TCACCATCAGGATTCCTCCACAGT; PPAR«-forward 
TCTCCACGTTCCAGCCCTTCCTCA, PPARa-reverse TTCACATGCGTGA 
ACTCCGTAGTG; PPARy-forward TCCGTGATGGAAGACCACTCGCAT, 
PPARy-reverse CAGCAACCATTGGGTCAGCTCTTG; Scd1-forward CTGTA 
CGGGATCATACTGGTTCCC, Scdl-reverse CAGCCGAGCCTTGTAAGTTC 
TGTG; Srebp1c-forward GGAGCCATGGATTGCACATT, Srebp1c-reverse GG 
CCCGGGAAGTCACTGT; Srebp2-forward GATGAGCTGACTCTCGGGGA 
CATC, Srebp2-reverse GTGGGGTAGGAGAGACTTTGACCT. 

Cell culture and luciferase assay. CRL-2189 and HEK293T (ATCC) cells were 
cultured in DMEM containing 10% FBS (HyClone), 100 mg mI penicillin-strep- 
tomycin. Mouse primary hepatocytes were isolated and cultured as previously 
described’*'*. For reporter studies, Ad-Cre-luc-infected hepatocytes (1 pfu per 
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cell) or Cre-luc-transfected HEK293T cells were exposed to forskolin (10 1M) 
for 4h. All cell lines were routinely tested for mycoplasma presence using a 
PCR detection kit (Sigma, MP0035). 

Immunoblot, immunoprecipitation and immunostaining. Assays were per- 
formed as described’*'*. CRTC2, pCREB, CREB, AKT, pAKT (Thr308), tubulin, 
HA and Flag antibodies were as previously described’*'®. Other antibodies were 
purchased as follows: rabbit polyclonal anti-Sec31A (A302-336A), Bethyl 
Laboratories; mouse monoclonal anti-Sec31A (612351), BD Biosciences; anti- 
KDEL (ab12223) and anti-Secl2 (ab3422), Abcam; anti-ERGIC-53 (GTX63674), 
GeneTex; anti-His (D291-3), MBL International; anti-Sarl (07-692), Millipore; 
anti-SREBP1 (SC-13551, SC-367), anti-Myc (SC-40), Santa Cruz; anti-SCAP 
(13102), anti-ACACA (3662), anti-FASN (3180), anti-SCD1 (2438), anti-Sec23A 
(8162), anti-S6K (2708), anti-pS6K (9234), anti-mTOR (2983) and anti-pmTOR 
(5536), anti-lipinl (5195), Cell Signaling Technology; anti-Sec24A (15958-1-AP) 
and anti-SREBP2 (14508-1-AP), Proteintech Group Inc. anti-Secl3 (NBP2- 
20278), Novus Biologicals. The phospho-S136 CRTC2 antibody was generated 
and purified by Beijing Prorevo Biotech Co., Ltd. 

In vitro budding assay. Mouse liver cytosol was harvested as described”*. In brief, 
mouse livers were perfused with 0.9% (w/v) NaC] through the portal vein at room 
temperature. The perfused livers were disrupted in ice-cold buffer (20 mM HEPES 
pH7.4, 150 mM KOAc, 5 mM MgOAc, 250 mM sorbitol) followed by 10 strokes in 
a 50ml Dounce homogenizer. Homogenates were centrifuged at 1,000g for 
10min. Supernatants were sequentially centrifuged at 20,000g for 20min, 
186,000g for 1h, and 186,000g for 45 min. The final supernatant was dialysed, 
divided into multiple aliquots and stored at —80°C. 

In vitro budding assays were carried out as reported”*”’. HEK293T cells were co- 
transfected with HA-SREBPlc and Flag-SCAP (provided by P. Li, Tsinghua 
University) for 48h. Cells were then cultured with lipid-deficient medium for 
another 3h, harvested and permeabilized by 5 min incubation in buffer (20 mM 
HEPES pH 7.4, 150 mM KOAc, 5 mM MgOAc, 250 mM sorbitol) with 40 pg ml? 
digitonin. The permeabilized cells were washed with the same buffer and used at 
40 1g protein per reaction. The budding reaction (4mgml_' mouse liver cytosol, 
1mM ATP, 40mM creatine phosphate, 0.2mg ml7' creatine phosphokinase, 
0.1 mM GTP, and 100 ng His-CRTC2 or His-CRTC2 (W143A) fusion proteins 
purified from Escherichia coli) was assembled on ice, incubated for 30 min at 30 °C 
and then put on ice. The donor membranes were removed by centrifugation at 
12,000g for 20 min at 4 °C. The supernatant fraction was centrifuged at 4°C by 
25 min 55,000 rpm using a TLA100 rotor in a Beckman Optima TLX ultracentri- 
fuge. The collected vesicles were analysed by SDS-PAGE. 

In vitro kinase assay. His-tagged CRTC2, CRTC2(S136A) and S6K fusion pro- 
teins were purified from E. coli. The kinase assay was performed as reported'*. The 
reaction system (20 ul), containing 150 ng fusion protein, 20 ng truncated mTOR 
(Millipore, 14-770) in reaction buffer (25 mM HEPES pH 7.4, 50 mM KCl, 5 mM 
MgCh, and 5 mM MnCl,), 50 uM cold ATP and 2 Ci [y-*?P] ATP, was incubated 
for 30 min at 30°C. Reactions were stopped by adding 5 pl sample buffer, then 
boiled for 10 min and analysed by SDS-PAGE followed by autoradiography. 
Mass spectrometry (MS). To identify CRTC2-interacting proteins, the ER frac- 
tion of CRL-2189 cells was isolated according to the manufacturer’s instructions 
(Sigma-Aldrich, ERO100). Immunoprecipitates of endogenous CRTC2 from the 
ER fraction were prepared for MS studies as previously reported’*”* and analysed by 
electrospray ionization tandem MS on a Thermo LTQ Orbitrap instrument. To 
identify the mTOR phospho-site(s) in CRTC2, HEK293T cells were transfected by 
Flag-CRTC2 and treated with or without 100nM rapamycin for 1h. Immuno- 
precipitates of Flag-CRTC2 were used to determine phospho-site(s) by MS. 
Statistical analyses. Age- and weight-matched male mice were randomly assigned 
for the experiments. The animal numbers used for all experiments are outlined in 
the corresponding figure legends. No animals were excluded from statistical ana- 
lyses, and the investigators were not blinded in the studies. All studies were 
performed on at least three independent occasions. Results are reported as mean + 
s.e.m. Comparison of different groups was carried out using two-tailed unpaired 
Student’s t-test. Differences were considered statistically significant at P< 0.05. 
No statistical methods were used to predetermine sample size. 


26. Zhao, X. et al. Regulation of lipogenesis by cyclin-dependent kinase 8-mediated 
control of SREBP-1. J. Clin. Invest. 122, 2417-2427 (2012). 

27. Li, P. et al. Adipocyte NCoR knockout decreases PPARy phosphorylation and 
enhances PPAR activity and insulin sensitivity. Ce// 147, 815-826 (2011). 

28. Nohturfft, A., Yabe, D., Goldstein, J. L., Brown, M. S. & Espenshade, P. J. Regulated 
step in cholesterol feedback localized to budding of SCAP from ER membranes. 
Cell 102, 315-323 (2000). 

29. Schindler, A. J. & Schekman, R. /n vitro reconstitution of ER-stress induced ATF6 
transport in COPII vesicles. Proc. Natl Acad. Sci. USA 106, 17775-17780 (2009). 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


foe) 


= Crtc2*" 
"Crtc2” 


4 


Total cholesterol 
(mg g" protein) 


ts) 


Plasma insulin (ng ml") 


HFD 


Extended Data Figure 1 | Profiling of cholesterol, gene expression, 

protein and circulating insulin levels in Crtc2*’* and Crtc2~’~ mice. 
a-d, Hepatic cholesterol levels (a), qPCR results for expression of genes 
involved in lipogenic regulation, lipid transport, fatty acid oxidation and 


triglyceride synthesis (b), plasma insulin level (c), and immunoblots showing 
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hepatic amounts of full-length, inactive SREBP2 (flSREBP2) and cleaved, active 
SREBP2 (nSREBP2), phospho-CREB (pCREB), CREB, and subunits of the 
COPII complex (Sec12, Sarl, Sec23A, Sec24A, Sec13 and Sec31A) (d) in 
Crtc2*'* and Crtc2-'~ mice fed with a regular diet (RD) or a high-fat diet 
(HED) for 18 weeks. Data are shown as mean + s.e.m. *P< 0.01, n= 10. 
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Extended Data Figure 2 | Validation of Srebp1 knockdown in mice. a-c, Hepatic triglycerides (TGs) (a), immunoblots (b) and qPCR results for lipogenic gene 
expression (c) showing the effect of Srebp1 RNAi in liver extracts of fed mice. Data are shown as mean + s.e.m. *P< 0.01, n = 10. US, unspecific. 
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Extended Data Figure 3 | Effect of CRTC2 and its mutants on 
gluconeogenic and lipogenic gene expression. a, Cellular localization of 
CRTC2 and its mutants CRTC2(ATAD) (amino acids 1-630) and 
CRTC2(ATAD/AA) (amino acids 1-630 with double alanine mutations at 


io” 


CRE-luc activity 

K ron) oo 

is) i) s) 

m7 

an 

* AAR 

NG 

* 
— 


NO 
(=) 
| 


< 9) 
«x RNY ar oo 
«P? 
iv 
CRTC2 
d 
3 
= 
=. us 
= 
Ss * 
= 
oO 1 
[= 
n 
Oo 
oa 
0+ 
& RS aS er 
> are 
cric2”" d 
Crtc2" 


Serl71 and Ser275) in mouse primary hepatocytes. FSK, forskolin. Scale bar, _ significant statistical difference. 
10 um. b, Effect of CRTC2 and its mutants on Cre-luc activity in mouse 
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primary hepatocytes treated with or without 10 1M FSK for 4h. Data are 
shown as mean = s.e.m. *P < 0.01, n = 6.c, d, Effect of CRTC2 and its mutants 
on lipogenic gene (Fasn, Scd1, Acaca) expression (c) and plasma insulin level 
(d) in fed mice. Data are shown as mean + s.e.m. *P < 0.01, n = 8. NS, no 
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Extended Data Figure 4 | Association of CRTC2 with Sec31A. a, Immuno- 
staining showing relative co-localization of CRT'C2 with an endoplasmic 
reticulum (ER) marker (KDEL) in CRL-2189 cells. Scale bar, 10 bm. 

b, c, Strategy to purify CRTC2-interacting proteins (b), and the peptides 
identified from Sec31A and Sec13 (c) by MS analysis of immunoprecipitates 
prepared with anti-CRTC2 antibody from CRL-2189 ER fractions. d, Co- 
immunoprecipitation assay showing amounts of Flag-tagged Sec13 or Sec31A 
recovered from immunoprecipitates of endogenous CRTC2 in HEK293T cells. 
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e, Deletion analysis of regions in CRTC2 required for the CRTC2-Sec31A 
interaction. Interaction-competent CRTC2 peptides are indicated by (+) in 
each schematic. f, g, Cellular localization of the tryptophan-to-alanine 
mutant of CRTC2 (W143A) and its effect on Cre-luc activity in HEK293T cells. 
Scale bars, 10 um. Data are shown as mean + s.e.m. *P < 0.01, n = 6. NS, 

no significant statistical difference. h, Co-immunoprecipitation assay showing 
amounts of Flag-tagged CRTC2 and YFP-tagged Sec23A recovered from 
immunoprecipitates of HA-tagged Sec31A in HEK293T cells. 
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Extended Data Figure 5 | Modulation of COPII-dependent SREBP1 activity and Sec31A-interaction-defective (W143A) CRTC2 on lipogenic gene 
by CRTC2. a, Immunostaining showing the effect of Crtc2 RNAi on the expression (b) and plasma insulin level (c) in fed mice. Data are shown as 
cellular localization of Sec31A. Scale bars, 10 jim. b, ¢, Effect of wild-type(WT) mean +s.e.m. *P< 0.01, n = 8. NS, no significant statistical difference. 
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Extended Data Figure 6 | Characterization of CRTC2 phosphorylation 
site(s) by mTOR. a, Immunoblots showing co-immunoprecipitation of 
CRTC2 and Sec23A with Sec31A in mouse primary hepatocytes in response to 
insulin and/or torin1 treatment. Mouse primary hepatocytes were incubated 
with 250 nM torin1 or control vehicle for 1 h before 30 min insulin (100 nM) 
stimulation. Phospho-S6K (pS6K), total S6K, phospho-AKT (pAKT), total 
AKT and phospho-CRTC2 (Ser136) levels are also indicated. b, Immunoblots 
showing co-immunoprecipitation of CRTC2 and Sec23A with Sec31A in 
mouse primary hepatocytes in response to amino acids and/or torin1 
treatment. Mouse primary hepatocytes incubated with amino-acid-free MEM 
for 3h were exposed to 250 nM torin1 or control vehicle for another 1h, then 
treated with amino acids for 30 min. c, Phospho-peptides of Flag-tagged 
CRTC2 identified by MS analysis of immunoprecipitates prepared with 
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anti-Flag from HEK293T cells treated with 100 nM rapamycin for 1h (Rap+) 
or not (Rap—). Serine 136 was phosphorylated (Yes) in the absence of Rap 
treatment (Rap—) and dephosphorylated (No) in the presence of Rap (Rap+). 
d, Co-immunoprecipitation assay showing the association between Flag-tagged 
CRTC2 and Myc-tagged mTOR in HEK293T cells. e, Effect of the mTOR 
inhibitors Rap and torinl on CRTC2 phosphorylation. Mouse primary 
hepatocytes were pretreated with vehicle (Veh), 100 nM Rap, or 250 nM torin1 
for 1h before 100 nM insulin stimulation for 30 min. f, g, Cellular localization 
of the phosphorylation-defective CRTC2 mutant (S136A) (f) and its effect 
on Cre-luc activity (g) in mouse primary hepatocytes. Scale bars, 10 im. Data 
are shown as mean + s.e.m. *P < 0.01, n = 6. NS, no significant statistical 
difference. 
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Extended Data Figure 7 | Effect of CRTC2(S136A) on SREBP1 maturation, Torin1 (20 mgkg_') was intraperitoneally injected 6h before livers were 
lipin1 localization and circulating insulin level. a, b, Effect of wild-type or _ harvested. For lipin1 localization, mouse primary hepatocytes were treated 
CRTC2(S136A) on lipogenic gene expression in liver (a) and plasma insulin with vehicle (Torinl—) or 250 nM torin1 (Torin1+) for 4h. Scale bars, 10 um. 
level (b) of fasted (3 h) and refed (1h after 3h fasting) mice. c-e, Effect of Data are shown as mean + s.em. *P < 0.01, 1 = 8. NS, no significant 
CRTC2(S136A) and torin1 treatment on SREBP1 maturation (c), hepatic statistical difference. 

triglycerides (d) and lipin1 localization in mouse primary hepatocytes (e). 
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Extended Data Figure 8 | Enhanced SREBP1 activation, triglyceride levels subunits in HFD-fed mice (c). d-g, Hepatic triglyceride amounts and 
and branched-chain amino acid levels in obese mice. a—c, Immunoblots branched-chain amino acid (BCAA) levels in liver extracts from lean, 
showing relative amounts and/or phosphorylation status of SREBP1, SREBP2, db/db, ob/ob and HFD-fed mice in the fed state. Data are shown as 
SCAP, mTOR, S6K, CRTC2, AKT and COPII subunits in fed lean and db/db = mean+s.e.m. *P<0.01, n= 10. 

mice (a), ob/ob mice (b), and relative amounts of SREBP2, SCAP and COPII 
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Extended Data Figure 9 | Improved insulin sensitivity in HFD-fed mice in 
the presence of CRTC2(ATAD/S136A). a-k, Effect of the mTOR-defective 
mutant CRTC2(ATAD/S136A) on metabolic parameters (a), including body 
weight, relative fat mass, food intake, plasma alanine aminotransferase (ALT) 
and aspartate aminotransferase (AST) activity, plasma cholesterol, plasma 
triglycerides, plasma insulin and blood glucose; energy expenditure (b); 


lipogenic gene expression (c); glucose tolerance (d); insulin tolerance (e); 
pyruvate tolerance (f); hepatic glucose production (HGP; g); insulin-stimulated 
glucose disposal rate (IS-DGR; h); percentage of free fatty acid (FFA) 
suppression (i); pAKT level in skeletal muscle (j); and pAKT level in epididymal 
white adipose tissue (k) from mice fed on a HFD for 18 weeks. Data are 
shown as mean + s.e.m. *P < 0.01, **P < 0.05, n = 8 (a-f), n = 6 (g-i). 
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Crucial HSP70 co-chaperone complex unlocks 
metazoan protein disaggregation 
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Protein aggregates are the hallmark of stressed and ageing cells, 
and characterize several pathophysiological states'”. Healthy meta- 
zoan cells effectively eliminate intracellular protein aggregates**, 
indicating that efficient disaggregation and/or degradation 
mechanisms exist. However, metazoans lack the key heat-shock 
protein disaggregase HSP100 of non-metazoan HSP70-dependent 
protein disaggregation systems”*, and the human HSP70 system 
alone, even with the crucial HSP110 nucleotide exchange factor, 
has poor disaggregation activity in vitro*’. This unresolved con- 
undrum is central to protein quality control biology. Here we show 
that synergic cooperation between complexed J-protein co-chaper- 
ones of classes A and B unleashes highly efficient protein disaggre- 
gation activity in human and nematode HSP70 systems. Metazoan 
mixed-class J-protein complexes are transient, involve comple- 
mentary charged regions conserved in the J-domains and 
carboxy-terminal domains of each J-protein class, and are flexible 
with respect to subunit composition. Complex formation allows 
J-proteins to initiate transient higher order chaperone structures 
involving HSP70 and interacting nucleotide exchange factors. 
A network of cooperative class A and B J-protein interactions 
therefore provides the metazoan HSP70 machinery with powerful, 
flexible, and finely regulatable disaggregase activity and a further 
level of regulation crucial for cellular protein quality control. 

To investigate the possibility of a potent protein disaggregation 
activity in metazoans, we focused on the HSP70 chaperone system, 
which displays some in vitro capacity to disentangle and refold aggre- 
gated polypeptides when powered by an HSP110 co-chaperone*’. 
The HSP70-J-protein-HSP110 functional cycle described in 
Extended Data Fig. la, by generally accepted extrapolation, occurs 
on protein aggregate surfaces. Homodimeric J-proteins are essential 
components of this cycle*’. Three classes of J-proteins (A, B and C) 
with >50 members in humans, determine HSP70 substrate selection, 
with some functional redundancy among members’. For example, 
class A and B J-proteins (Fig. 1a) implicated in protein quality control 
have common functions, but independent and differing efficacies”. 
The basis for the evolutionary maintenance of these two classes of 
J-proteins (despite appreciable internal diversity’*’*), and the relation 
of class to function and principles governing substrate selection, remain 
unknown. 

Here we explore the full potential of the metazoan HSP70-J-protein- 
HSP110 system in protein disaggregation, by examining the func- 
tional relationship between class A and B J-proteins. Using thermally 
denatured luciferase from Photinus pyralis as model substrate’, we 
investigate the in vitro protein disaggregation/refolding versus pro- 
tein refolding-only (Extended Data Fig. 1b-d) capacities of the human 
and Caenorhabditis elegans HSP70-HSP110 systems (also known as 
HSPA8-HSPH2 in humans, and HSP-1-HSP-110 in C. elegans) in 
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conjunction with class A and B J-proteins (Fig. 1, Extended Data 
Fig. le and Extended Data Table 1). 

In disaggregation/refolding reactions with a class A (JA2) and B 
(JB1) J-protein present together (Fig. 1b, magenta), rather than either 
class J-protein alone (Fig. 1b, green or blue), we observed unpreced- 
ented reactivation of pre-formed heat-aggregated luciferase, indicating 
synergistically accelerated protein disaggregation. This was also seen 
under limiting chaperone concentrations (maintained in all further 
experiments) with multiple class A (JA1 and JA2) and class B (JB1 
and JB4) human J-proteins (Extended Data Figs 1f and 2a). Disagg- 
regation reactions with the corresponding nematode HSP-1-HSP-110 
system and J-proteins DNJ-12 (class A) and DNJ-13 (class B) show 
similar synergic acceleration (Fig. 1c and Extended Data Fig. 2c, d). In 
reactions containing only one J-protein class (Extended Data Fig. If, 
JA2, solid lines; or JB1, dashed lines), with increased J-protein levels of 
threefold or more relative to the mixed-class J-protein reaction 
(Extended Data Fig. 1f, magenta), protein disaggregation/refolding 
slows and is inhibited. We infer that the presence of class A and B 
J-proteins together, rather than J-protein amount, determines reaction 
efficiency. Both the disaggregation/refolding rate (Extended Data Fig. 
2e, f) and yield (Extended Data Fig. 2g) of renatured luciferase peak 
with equal proportions of class A to B. A broad range of flanking 
reciprocal A to B J-protein stoichiometries also show appreciable activ- 
ity, suggesting that efficient disaggregation/refolding is supported by 
minimal amounts of preferentially interacting A and B J-proteins. 
Increased initial rates at higher stoichiometries of JA2 (Extended 
Data Fig. 2f) reflect intrinsically higher refolding capacity of class A 
J-proteins with HSP70 (Extended Data Fig. 2b, green)"*. 

Disaggregation synergy in mixed J-protein class reactions occurs 
with and without small HSP (Saccharomyces cerevisiae Hsp26) incorp- 
oration into aggregates for both human (Extended Data Figs 1fand 3a) 
and nematode J-protein containing systems (Fig. 1c and Extended 
Data Fig. 2d). Synergy is independent of nucleotide exchange factors 
(NEFs) (Extended Data Fig. 3b), protein substrate (Fig. 1b and 
Extended Data Fig. 3c, d) and substrate concentration variations 
affecting density, size* and therefore the architectural nature of the 
aggregate generated (Extended Data Fig. 3e). Synergy also occurs at 
lower chaperone to substrate ratios (Fig. 1b and Extended Data Figs 1f 
and 3f), and at different and characteristic ranges of substrate to 
J-protein ratio for malate dehydrogenase (MDH) versus luciferase or 
a-glucosidase disaggregation (Fig. 1b and Extended Data Fig. 3c, d). 
MDH aggregates resolve considerably with non-limiting concentra- 
tions of JB1 alone (not shown), but with limiting JB1 concentrations in 
the presence of JA2, synergic MDH disaggregation occurs (Extended 
Data Fig. 3d). Synergy in disaggregation therefore appears generic, 
operating over a range of ratios and concentrations, with room for 
substrate-linked variation. By contrast, refolding-only reactions show 
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Figure 1 | Simultaneous presence of class A and B J-proteins unleashes 
protein disaggregation activity and broadens target aggregate range of the 
HSP70 machinery. a, Two distinct classes (A and B) display highly conserved 
domain organization involving the HSP70-intertacting HPD motif (red) 
containing amino-terminal J-domain (JD), Gly/Phe-rich flexible region (G/F), 
C-terminal B-sandwich domains (CTD-I and II), with class A J-proteins 
distinguished mainly by a zinc-finger-like region (ZFLR) that inserts into the 
CTD-I subdomain and a dimerization domain (D)’”’. CTD together with ZFLR 
provide substrate specificity*”*. b, Disaggregation and reactivation of preformed 
luciferase aggregates using human HSP70-HSP110 with human J-proteins JA2 
(green), JB1 (blue), JA2+JB1 (magenta) or with no J-proteins (black) (n = 3). 

c, Reactivation of heat-aggregated luciferase by nematode HSP70 machinery 


no synergism (Extended Data Fig. 2b). We conclude that efficient 
protein disaggregation, but not refolding, requires cooperation 
between class A and B J-proteins. 

Three non-exclusive mechanisms could explain the synergistic action 
of class A and B J-proteins. In a mechanism involving sequential action, 
one J-protein class interacts with HSP70-HSP110 to extract polypeptides 
from aggregates. The other J-protein class then prevents re-aggregation 
of extracted polypeptide (holdase function) and/or in combination with 
HSP70-HSP110 promotes substrate refolding. Of the four J-proteins 
tested for holdase function, only JA2 and JB4 prevent luciferase aggrega- 
tion at 42°C (Extended Data Fig. 3g, h). However, disaggregation 
synergy is indistinguishable for J-protein combinations with (JA2 or 
JB4) and without (JA1 or JB1) holdase function (Extended Data Fig. 
2a). Furthermore, disaggregation/refolding rates are unaffected by the 
order of JA2 and JB1 addition during the reaction (Extended Data Fig. 
3i), indicating that J-proteins act in no strict order. For direct validation, 
we quantified tritium-labelled luciferase extracted from aggregates using 
a mutant GroEL protein (GroEL”*”®) as a trap’ for extracted luciferase 
molecules, preventing refolding. Decreased luciferase activity in disag- 
gregation/refolding reactions in the presence of GroEL”®”* reflects trap- 
ping of labelled disaggregated polypeptides (Extended Data Fig. 4a, b), 
counted by measuring tritium scintillation (Fig. 1d). Disaggregation/ 
refolding reactions containing only one class of J-protein show similar 
amounts of trapped *H-labelled luciferase polypeptides. With class A and 
B J-proteins present together, however, we see synergistically accelerated 
accumulation of disaggregated *H-labelled luciferase trapped in GroEL 
(Fig. 1d). Together, these results exclude a strictly sequential function of 
J-protein classes in disaggregation/refolding, corroborating the inference 
that synergy occurs at the protein disaggregation step. 

A second model stipulates that each J-protein class acts specifically, 
in parallel, distinguishing protein aggregates by size and/or compact- 
ness during the disaggregation step. We tested this by adding different 
J-protein-HSP70-HSP110 mixtures to preformed *H-labelled lucifer- 
ase aggregates, which display a range of sizes, and probably variations 


248 | NATURE | VOL 524 | 13 AUGUST 2015 


MB +JB1 Ml +0.5 JA2+ 0.5 JBI 
Disaggregation/refolding (40 min) 


sn 7 

containing HSP-1, HSP-110 and either alone or in combination with the 
nematode J-proteins DNJ-12 (A) and DNJ-13 (B) (m = 2). d, Fold change in 
trapped luciferase; control, GroEL*”* without other chaperones (black). Values 
normalized to total 7H counts in each reaction (n = 2). e, SEC profile after 
disaggregation/refolding (120 min) with either J-protein alone or combined. 
Elution fractions labelled F1-F4 (red lines); F4, disaggregated monomers 

(~63 kDa). f, Aggregate quantification for fractions F1-F4 from the SEC profile in 
e. Disappearance of *H-luciferase from aggregates (F1-F3) occurs with 
concomitant accumulation of disaggregated monomer (F4). g, Aggregate 
quantification, after 40-min disaggregation. Values normalized to total counts in 
each reaction. Two-tailed t-test, **P < 0.01, ***P < 0.001 (n = 3). Data are 
mean + s.e.m. Precise concentrations are shown in Extended Data Table 1. 


in molecular architecture. We analysed the disaggregation of aggregate 
populations by size-exclusion chromatography (SEC; Fig. le-g and 
Extended Data Fig. 4c, d). Reactions were run in parallel, stopped by 
depleting ATP with apyrase, and held on ice until SEC (Extended Data 
Fig. 4e). Eluted fractions (F1-F4, Fig. le-g) reveal JA2-containing 
chaperone mixes preferentially solubilize smaller aggregates (F3; 
~200-700 kilodaltons (kDa)). Conversely, JBl-containing mixes 
preferentially solubilize larger aggregates (Fl, =5,000kDa; F2, 
~700-4,000 kDa), but solubilize small aggregates less efficiently. 
These results are consistent with distinct, parallel class activity. JA2 
plus JB1 combinations, however, in much shorter reactions (40 min 
instead of 120 min), solubilize both larger and smaller aggregates 
far more efficiently than the added efficiencies of separate JA2 and 
JB1 reactions allow (Fig. 1g). Similar results obtain throughout for 
a-glucosidase aggregate solubilization (Extended Data Fig. 4d). This 
suggests concerted action on the same target. 

This prompts a third model, in which synergy results from the 
formation of mixed-class J-protein complexes exerting concerted 
activity to facilitate disaggregation. A range of approaches rigorously 
tests this model. 

To visualize individual versus complexed J-protein function, we 
biased disaggregation/refolding reactions by combining JA2:JB1 in 5:1 
to 1:5 ratios, then analysed aggregate resolution by SEC (Extended Data 
Fig. 4f). The 1:1 ratios dissolve all aggregates (F1-F3, magenta). In con- 
trast, limiting JB1 concentration and excess JA2 in shorter reactions 
(40 min, orange solid) barely resolves the largest aggregates (F1), whereas 
the smaller aggregates (F2-F3) disappear completely; Fl aggregates 
resolve only in longer reactions (120 min, orange hash). Limiting JB1 
concentrations alone, however, readily resolve large F1 aggregates (blue 
solid). We infer that scarce JB1 molecules preferentially sequester with 
excess JA2 into complexes that efficiently process all sizes of aggregates; 
the smaller F2 and F3 aggregates accordingly disappear first. Reciprocal 
titration with scarce JA2 and excess JB1 concentration shows less dis- 
aggregation of the smaller F2 and F3 aggregates (magenta versus red 
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solid, Extended Data Fig. 4f), which fully resolve with a longer reaction 
time (120 min, red hash). Specific J-protein stoichiometries evidently 
modulate HSP70 targeting and disaggregation efficacy. We infer that 
J-proteins preferentially form efficient mixed-class complexes, support- 
ing a model for concerted action. 

Independent tests for physical interactions between class A and B 
J-proteins consistently reveal intermolecular J-domain-C-terminal- 
domain (JD-CTD) and CTD-CTD contacts. Approaches include 
chemical cross-linking coupled to mass spectrometry (Fig. 2a), 
Forster resonance energy transfer (FRET; Fig. 2b), docking simulations 
(Fig. 2c, d) and competition assays (Fig. 2e). 

Mass spectrometry of JA2 and JB1 combinations treated with lysine- 
specific cross-linker (disuccinimidyl suberate) identifies three intermol- 
ecular cross-linked regions between jD’?-cTp"™!, Jo!-cTpD”? and 
CTD'*-CTD!”! (Fig. 2a and Extended Data Fig. 5a, b). FRET measured 
by donor quenching indicates JO-CTD and CTD-CTD interactions 
between JA2 and JB1 in solution (Fig. 2b, J-protein pairs 1, 2 and 3; 
Extended Data Fig. 6a). This corroborates our cross-linking data and 
favours biological relevance. We detect neither JD-JD interactions 
between classes (J-protein pair 4), nor intermolecular same-class 
JD-CTD interactions (J-protein pair 5). However, in agreement with 
structures from small-angle X-ray scattering of class B J-proteins’*”’”, we 
detect JD™'-~CTD™! cross-links (not shown). Presumably these reflect 
intramolecular interactions, preventing intermolecular JD™_cTp®! 
but not JD?-CTD®! interactions, as indicated by FRET (Fig. 2b). 

We further defined the interface of the JA2-JB1 complex using 
unbiased docking simulations between J-domain and CTD dimers 
of JA1, JA2, JB1 and JB4 (Fig. 2c, d and Extended Data Fig. 7a, b). 
Simulations show a preferred binding arrangement of JD’! on 
CTD!’ and conversely JD/*” on CTD!" (Fig. 2c, d), again corrob- 
orating cross-linking data (Fig. 2a). 

Furthermore, in competition experiments, the addition of moderate 
excess of isolated J-domain fragments inhibits JA2-JB1-HSP70- 
HSP110-dependent disaggregation/refolding of heat-aggregated 
luciferase (Fig. 2e), although not refolding alone (Fig. 2f). J-domain 
fragments carrying the HPD motif mutated to QPN, which abolishes 
the JD-HSP70 interaction and ATP hydrolysis stimulation on HSP70 
(refs 18, 19), have the same effect (Extended Data Fig. 6e), confirming 
that inhibition of disaggregation is not due to HSP70 being titrated out 
by J-domain fragment binding. Unlabelled full-length J-proteins 
and isolated J-domains compete with mixed-class JD-CTD interac- 
tions, indicated by decreased FRET efficiency between JA2 and JB1 
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(Extended Data Fig. 6f, g), explaining the inhibitory effects. However, 
JD-CTD interaction sites do not overlap CTD binding sites for sub- 
strate, since JA2 holdase activity remains unaffected with isolated 
J-domains present (Extended Data Fig. 7c, d). Molecular docking 
modelling supports this also (Extended Data Fig. 8). J-protein com- 
plexing involving mixed-class J-domains and CTDs is therefore crucial 
for efficient protein disaggregation, but not for refolding. 

Non-ionic detergent affects neither disaggregation activity (Extended 
Data Fig. 6b) nor FRET efficiency between class A and B molecules 
(Extended Data Fig. 6a). Increasing salt concentrations, however, 
weaken both (Extended Data Fig. 6c, d), suggesting ionic interactions. 
Independent methodologies therefore confirm specific JD-CTD 
interactions of a predominantly electrostatic nature directly impli- 
cated in disaggregation efficiency. 

J-domain and CTD regions display highly conserved, class-specific 
electrostatic potentials (Fig. 3a, b). Class A J-proteins show distinct 
polarity in the CTDs, with negatively charged regions (red) in the 
CTD-II and dimerization subdomains, and positively charged regions 
(cyan) along the zinc-finger-like region and CTD-I hook (Fig. 3a). 
Conversely, class B CTDs are relatively non-polar, with positively 
charged regions in the CTD where JD!” cross-linking occurs. 
J-domains in both classes are markedly bipolar, although class A 
J-domains have smaller negatively charged regions (Fig. 3b). 
In all J-domains, positive charge (near the HPD motif and helix-II) 
is implicated in binding to HSP70 (refs 18, 19). We deduce conserved 
negatively charged regions exposed in the J-domains interact with 
positively charged CTD regions in opposite class J-proteins. 

We therefore generated triple charge-reversal variants of the 
J-domain (JA2R®® or JB1RR%), replacing negatively charged Asp or 
Glu residues with positively charged Arg residues in and around 
helices-I and -IV (Fig. 3c). FRET interactions between the Jb? and 
CTD’®' regions diminish with charge-reversal mutations in either JD” 
or JD)®! (Fig. 3d, J-protein pairs 2 and 3), and are abrogated with charge 
reversals in both interacting J-domains (Fig. 3d, J-protein pair 4). Partial 
FRET reduction with triple charge reversals in only one interacting JD- 
CTD domain pair suggests some degree of intermolecular tethering by 
the other pair, although insufficient for full J-protein cooperation and 
disaggregation efficiency (Fig. 3e). In refolding-only reactions, recovered 
luciferase activity remains unaffected by J-domain charge reversals 
(Fig. 3f). Physically complexed and cooperating mixed-class J-proteins 
are therefore essential for efficient HSP70-dependent disaggregase activ- 
ity, and are thought (but not directly shown) to act on the surface of 


Figure 2 | Intermolecular JD-CTD interaction is 
required for mixed-class J-protein complex 
formation. a, Intermolecular cross-links (dashed 
lines) between Lys residues (orange) on JA2 (green) 
and JB1 (blue). b, JA2 and JB1 interactions analysed 
by FRET. Bars show donor quenching efficiency of 
JD-CTD interactions; cartoons below show 
fluorophore positions in J-protein protomer pairs 
1-5. N-termini of JD? and JD’™" are labelled with 
acceptor fluorophore ReAsH. CTD? and CTD®! 
are labelled with donor fluorophores FIAsH and 
Alexa Fluor 488 at residues 241 and 278, respectively 
(n = 3). ¢, d, Ribbon diagrams showing 
representative positions of JDs on CTD dimers from 
docking simulations; cross-linked Lys residues 
(space filling, orange, connected with black dashed 
lines) established in a; HPD motif (stick 
representation, red). c, JD! (blue) and CTD“? 
(green). d, jb"? (green) and CTD"™! (blue). 

e, f, Competition of excess isolated JD fragments for 
classes A and B J-protein complex formation and 
effect on luciferase disaggregation. e, f, Protein 
disaggregation/refolding (e) and refolding-only 

(f) (n = 3). Data are mean + s.e.m. Precise 
concentrations are shown in Extended Data Table 1. 
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Figure 3 | Conserved electrostatic potential distributions in A and B 
J-protein classes are complementary and direct mixed-class J-protein 
interactions for complex formation. a, Electrostatic isopotential maps of CTD 
dimers comparing human (JA1, JA2, JB1 and JB4) and nematode (DNJ-12 and 
DNJ-13) class A and B J-proteins. Electrostatic potential around proteins is 
contoured at +1 (positive, cyan) and —1 (negative, red), kcal mol | e~!. Protein 
structures are represented by ribbon diagrams. b, Conserved «-helices and 
electrostatic isopotential maps contoured as in a of human and nematode 
J-domains. I-IV (from N-terminus) denote conserved «-helices. c, The J-domains 
of charge-reversal triple mutants (JA2R8 and JB1®®8); and their electrostatic 


aggregates. A paradoxical sequence dispensability of these highly con- 
served helices-I and -IV observed in early data, which assayed exclu- 
sively for HSP70 interaction and protein folding’*’®, is also now 
explained. These data together strongly support a mixed-class 
J-protein interaction with vital function conserved in evolution. 

Size separation of tritiated JB1 mixed with unlabelled, larger JA2, or 
the reciprocal labelling, reveals only JB1 (blue) or JA2 (green) homo- 


Class A + B complex targets 


Pulling 
® cy forces 


| eee A+B a 
| eee action 


Class B 
action 
Repeated HSP70 


eee eet 


r @ 4 Pulling 
° forces 


Multiple HSP70 
recruitment 


Wixed class 
J-protein 
complex 


HSP70 # 7, 


Class B 
J-protein 


Disaggregation 


Class A 
J-protein 


Single HSP70 
recruitment 


250 | NATURE | VOL 524 | 13 AUGUST 2015 


= 
wD 
B 


Repeated HSP70 


recruitment 
SL 


JAQRRR 
D6R oS 
ZA Af 
HPD 
motif 
E61R ~E64R 
JB1RRR 
DAR yE69R * 
r E70R 
HPD 
motif 
d > 30- * Fluorophore position 
= 0 
= JDRAR 
2 
= 204 
rs] 
2 
oO 
5 
= 104 
ce) 
* < 
2 o 
a 
iy o- 
3100 /00:00: 
H { H 
ay H { H 
H H H 
ata ttt t 
fal H H 
NJ-13 (oe; : 
SS Se 
f HSP70, HSP110 
o 
8 120 Refolding 
23 
3 80 
mE 
£5 40 
zs 
go 
a SUE ERTEE RB EHRELEREE DE SL 
S$ 2a Ee areOpepreirie VES 
-SaeauP?Psar Ss 5 a or - & 
ee ie ee 
+ 7 + 5 eo +a ey + gs 
g oe - 2 - 4 +3 
SSS e982 g8¢ 
bw wm 2 2 oS Se 7 
$o68 8685 6 4 2 
+ + WwW es oe 
ae 
- 3 
sa 


isopotential maps compare with wild-types in b. RRR denotes triple amino acid 
substitutions D6R, E61R and E64R in JA2, and D4R, E69R and E7OR in JBI. 

d, FRET determination of JA2 and JB1 triple charge-reversa; mutants (n = 3). 
Bars show donor quenching efficiency of JD-CTD interactions; cartoons below 
show fluorophore positions in J-protein protomer pairs 1-4. Triple charge 
mutants are yellow. e, Luciferase disaggregation/refolding at 120 min with 
J-domain charge-reversal mutants (JA2: D6R (R); E61R+E64R (RR); 
D6R+E61R+E64R (RRR). JB1: D4R (R); E69R+E70R (RR); D4R+ E69R+E70R 
(RRR)) (1 = 3). f, As in e, refolding-only at 80 min (n = 3). Data are 

mean + s.e.m. Precise concentrations are shown in Extended Data Table 1. 


dimers (Extended Data Fig. 5c), indicating that J-protein complexes 
are transient. Transient interactions would support an HSP70 
disaggregation machinery with a flexible range of tailored activities. 
Single-class J-protein function shows HSP70-HSP110-mediated dis- 
aggregation activity limited to aggregates of specific size ranges (Fig. 4, 
large or small aggregates). Mixed-class J-protein complexes efficiently 
disaggregate a wide range of aggregate sizes (Fig. 4, large, medium and 


Figure 4 | Model of individual versus 
complexed class A and class B J-protein 
function in protein disaggregation. Size- 
specific aggregate targeting: large aggregates 
are targeted by J-protein®*s >-HSP70- 
HSP110 (blue); small aggregates are 
targeted by J-protein“** “~HSP70-HSP110 
(green); all aggregates sizes are targeted by 
J-protein-mixed-class-complex-HSP70- 
HSP110 (magenta). HSP70 molecules are in 
grey. Sequential reaction steps (encircled 
numbers): 1, J-protein targets aggregate; 
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— ® (dashed red arrows); and 4, polypeptide 


Single HSP70 


feerultment extraction leading to protein disaggregation. 
Chaperone recruitment denoted by 


dashed black arrows. 


©2015 Macmillan Publishers Limited. All rights reserved 


small). On the basis of our results, we reason a minimum complex 
consists of one class A J-protein homodimer binding to one class B 
homodimer in a 1:1 ratio, indicating that there are four J-domains per 
complex. Assuming two J-domains engage in interactions sufficient to 
complex the J-proteins, one J-domain per homodimer remains free to 
interact with one HSP70, allowing for recruitment of two interacting 
HSP70 molecules per complex without steric hindrance (Fig. 4, med- 
ium aggregates, Extended Data Fig. 8). We conclude that each mixed 
class J-protein complex recruits at least two HSP70 molecules per 
targeting event, possibly seeding dynamic, higher order chaperone 
assemblies on aggregate surfaces. 

Our computational models of the structures of mixed class J-protein 
complexes (Extended Data Fig. 8) incorporate the constraints defined by 
all our cross-linking, FRET and docking data. In each model, space in 
the J-protein complex allows for substrate binding via several interfaces, 
HSP70 interaction with J-domains, and HSP110 interaction with each 
HSP70 protein. These models accommodate the concept of entropic 
pulling, in which HSP70 binding to entangled polypeptides decreases 
entropy, generating reciprocal forces that pull polypeptides from aggre- 
gates*'. Such higher order chaperone complexes would be expected to 
increase pulling forces and stabilize disaggregating polypeptides by pro- 
viding increased substrate binding surface, thereby accelerating protein 
disaggregation (Fig. 4; class A+B complex). Although also likely, direct 
verification of mixed-class J-protein-HSP70 complexes interspersed 
with single-class J-protein-HSP70 complexes on aggregate surfaces is 
currently experimentally intractable. 

In summary, we demonstrate potent protein disaggregation activity in 
metazoans, mediated by the central HSP70-J-protein-HSP110 chaper- 
one network. Disaggregation efficacy comparable to that of non-meta- 
zoan HSP100-HSP70 bi-chaperone systems, over a broad aggregate size 
range, requires transient physical interaction between class A and B 
J-proteins. The assembly of higher order chaperone complexes on pro- 
tein aggregate surfaces is expected to increase coordinated pulling forces 
on multiple trapped polypeptides, providing a plausible mechanistic 
basis for increased disaggregation efficacy. Mixed-class J-protein com- 
plexes form preferentially and interact with HSP70-HSP110 to resolve a 
broad range of aggregates efficiently, whereas single-class J-protein— 
HSP70-HSP110 interaction targets specific aggregate sizes. This suggests 
intracellular J-protein stoichiometry will differentially regulate HSP70- 
dependent protein disaggregation efficiency. The transitory nature of 
J-protein complexes would, in this context, facilitate flexible response 
according to need. As in nematodes, human cytosol contains several 
members of J-protein classes: four class A and nine class B J-proteins’. 
A wide range of complexed J-protein combinations is therefore available 
in humans and other metazoa, providing flexible target selectivity. This 
opens the further possibility of physiological function in assembly/dis- 
assembly of other macromolecular cell structures. These findings may 
also impinge on the amorphous, oligomeric, most toxic prefibrilar phase 
of amyloidic fibre formation characterizing neurodegenerative diseases”. 
Overall, our work identifies a physically interacting J-protein network 
that adds another level of functional flexibility to cellular protein quality 
control. The underlying functional basis for hitherto unexplained evolu- 
tionary maintenance of distinct J-protein classes now also becomes clear. 
In essence, we reveal a J-protein gearbox regulating efficacy of protein 
disaggregation and consequently, refolding reactions, with fundamental 
effect on the cellular physiology, and therefore health, of metazoan 
organisms. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
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METHODS 

Plasmids and protein purification. Clones of human J-proteins (DNAJA1, 
DNAJA2, DNAJB1 and DNAJB4) were obtained either directly from Addgene 
or as gifts from H. Kampinga in pcDNA5/FRT/TO plasmids. C. elegans dnj-12, 
dnj-13, C30C11.4 (hsp-110) and hsp-1 genes were amplified by PCR using com- 
plementary DNA preparations from heat-shocked three-day-old animals as a 
template. The above-mentioned genes were then recloned into protein expression 
vector pCA528 or pSUMO with a 6XHis-Smt3 tag as previously described’®. 
Mutants of J-proteins were generated by standard PCR mutagenesis techniques 
and verified by sequencing. JA2 and JB1 variants for N-terminal FIAsH and 
ReAsH labelling were generated by incorporating the Cys-Cys-Pro-Gly-Cys-Cys 
tag (5’-TGTTGTCCAGGGTGCTGC-3’) after N-terminal methionine. The same 
labelling motif was generated in JA2 CTD by inserting two Cys residues before 
Pro241 and two Cys residues before Val243. The JB1 CTD labelling mutant was 
generated by mutating Gly278 to Cys. To obtain isolated J-domains, the J-domains 
of JA2 (1-77 amino acids) and JB1 (1—76 amino acids) were PCR-amplified with 
a C-terminal TAG site and cloned into pCA528. HPD motifs were mutated to 
QPN by changing H36Q+D38N and H32Q+ D34N in isolated J-domains of JA2 
and JB1, respectively. Purification of J-proteins and their variants was performed 
by affinity (Ni-IDA, Macherey-Nagel; Ni-NTA, Pierce), size-exclusion and ion- 
exchange chromatographic methods. In brief, BL21(D3E)/pRARE Escherichia coli 
strains carrying the corresponding expression vectors were induced for protein 
expression with 0.5 mM isopropyl-1-thio-p-galactopyranoside (IPTG, Sigma- 
Aldrich) for 3h at 30°C. All C. elegans chaperones were expressed at 20°C with 
1mM IPTG overnight. Cells were lysed either in 50 mM HEPES-KOH, pH7.5, 
750 mM KCl, 5mM MgCl, protease inhibitor cocktail (Roche), 2mM phenyl- 
methylsulphony! fluoride and 10% glycerol (for human J-protein purifications), or 
in 30 mM HEPES-KOH, pH 7.4, 500 mM K-acetate, 5mM MgCl, 1 mM B-mer- 
captoethanol, 2 mM phenylmethylsulphonyl fluoride, protease inhibitor cocktail 
(Roche) and 10% glycerol (for nematode chaperone purifications). After centrifu- 
gation at 30,000g (30 min, 4 °C) the resulting supernatants were applied to a Ni- 
NTA/Ni-IDA matrix and incubated for 60 min at 4 °C. Subsequent washing steps 
were performed with high-salt buffers (50mM HEPES-KOH, pH7.5, 750 and 
500 mM KCl, 5mM MgCl, and 10% glycerol) for human J-protein purifications. 
Worm chaperones were first washed in high-salt buffer (30 mM HEPES-KOH, 
pH7.4, 1M K-acetate, 5mM MgCl,, 1mM f-mercaptoethanol, 2mM phenyl- 
methylsulphony! fluoride and 10% glycerol), followed by a low-salt wash (identical 
to the high-salt buffer with 50 mM instead of 1 M K-acetate). Protein elution was 
performed with 300 mM imidazole in the corresponding low-salt buffers. Dialysis 
was performed overnight at 4 °C in the presence of 4 1g His-tagged Ulp1 per mg 
substrate protein for proteolytic cleavage of the 6XHis-Smt3 tag. The 
6X His-Smt3 tag and His—Ulp1 were removed by incubating the dialysed proteins 
in Ni-NTA/Ni-IDA matrix for 60 min at 4 °C. The targeted proteins were further 
purified using Superdex 200 (human J-proteins and their variants), ion exchange 
using the Resource Q (anion exchange for DNJ-12, HSP-1, HSP-110 and isolated 
human J-domain fragments) or Resource S (cation exchange for DNJ-13) columns 
(GE Healthcare). Firefly luciferase and human HSPA8 and HSPH2 were purified 
as previously described*. Pyruvate kinase and a-glucosidase were purchased from 
Sigma-Aldrich. Pig heart muscle MDH was purchased from Roche. 

Luciferase refolding and disaggregation/refolding assays. For refolding-only 
assays, 20nM luciferase plus 750nM HSPA8, 40nM HSPH2, 380 nM J-protein 
and 100nM Hsp26 was incubated at 42 °C for 10 min in HKM buffer (50mM 
HEPES-KOH, pH7.5, 50mM KCl, 5mM MgCl, 2 mM dithiothreitol (DTT), 
2mM ATP, pH7.0 and 10 1M BSA) to generate thermally denatured luciferase. 
Luciferase refolding was initiated by adding an ATP regenerating system (3 mM 
phosphoenolpyruvate and 20 ng pl’ pyruvate kinase) and by shifting the reaction 
to 30°C. Luciferase activity was measured at the indicated time points with a 
Lumat LB 9507 luminometer (Berthold Technologies) by transferring 1 pl of 
sample to 100 pl of assay buffer (25 mM glycylglycin, pH 7.4, 5mM ATP, pH7, 
100mM KCl and 15mM MgCl) mixed with 100, of 0.25mM luciferin. 
Luciferase aggregates for disaggregation/refolding reactions were generated as 
previously described’. Either 25nM or 2M luciferase with fivefold excess of 
Hsp26 was aggregated at 45°C for 15 min. Protein disaggregation/refolding was 
initiated by adding the indicated chaperone mixtures to preformed luciferase 
aggregates and shifting the reaction temperature to 30 °C. Luciferase disaggrega- 
tion/refolding assays with C. elegans HSP70 chaperone system were performed at 
either 20°C or 22 °C (in Fig. 1c and Extended Data Fig. 2c, d). 

In cells from bacteria to human, aggregates that form under thermal stress 
contain small HSPs (sHSPs) such as S. cerevisiae Hsp26. sHSP incorporation 
facilitates aggregate resolution through inherent chaperone holdase activity, pos- 
sibly changing the density, size and architecture of aggregates, making these more 
accessible and manageable for disaggregation machineries*”’. The in vitro assay 
system incorporating Hsp26 therefore more closely approximates the situation in 


the cell. We used S. cerevisiae Hsp26 because it is the only sHSP induced and 
activated in yeast cells after heat shock, and has been extensively characterized 
in vivo and in vitro**’*®, Hsp26 is therefore the generic heat-induced sHSP of yeast, 
which justifies its use for our study. By contrast, there are 10 human sHSPs, each 
with different substrate binding specificities and affinities*”°, and there is no clear 
basis for choosing one above another. Also, some of these, including those recog- 
nizing luciferase as substrate, interact to form hetero-oligomers, which display yet 
further different properties” resulting in complicated assay analysis. Furthermore, 
the activation mechanism of at least one of these is subject to controversy (phos- 
phorylation and dephosphorylation have both been reported”; P. Goloubinoff, 
personal communication). 

a-glucosidase disaggregation assay. o-glucosidase aggregation was achieved by 
incubating 50 nM of substrate with 500 nM Hsp26 in HKM buffer without DTT at 
50°C for 15 min. Disaggregation/refolding of aggregates was initiated by adding 
indicated chaperone mixtures supplemented with an ATP regenerating system 
and by shifting the reaction to 30 °C. Reactivation of o-glucosidase was measured 
with an a-glucosidase assay kit from Abnova using a FLUOstar Omega plate- 
reader (BMG LABTECH). 

MDH disaggregation assay. Pig heart muscle MDH (Roche) aggregation was 
achieved by incubating 150 nM of substrate with 750 nM Hsp26 in HKM buffer at 
47°C for 30 min. Disaggregation/refolding of aggregates was initiated by adding 
indicated chaperone mixtures supplemented with an ATP regenerating system 
and by shifting the reaction to 30°C. MDH activity was measured using a pot- 
assium phosphate (150 mM, pH 7.6) buffer containing 1 mM oxaloacetate, 2mM 
DTT and 0.56 mM NADH. Activity measurements were taken using a FLUOstar 
Omega plate-reader. Refolding rates were calculated from the linear increase of 
substrate activities. 

SEC and aggregate profiling. Tritium (*H) labelling of firefly luciferase, 
o-glucosidase, JA2 and JB1 was performed with N-succinimidyl-[2,3-*H]propionate 
(Hartmann Analytic) according to manufacturer’s guidelines. Unincorporated 
N-succinimidyl-[2,3-*H] propionate was removed using dialysis with HKM buffer 
and either 150 mM (for luciferase and «-glucosidase) or 500 mM (for J-proteins) 
plus KCl at 4°C, overnight. *H-labelled luciferase and o-glucosidase were 
aggregated as described in the disaggregation/refolding assays. Luciferase and 
a-glucosidase reactivation were performed by adding specified chaperone cocktails 
and incubating at 30°C. Reactions were quenched with apyrase (0.8 pig pl) at 
40 or 120 min and placed on ice. Aggregated luciferase/x-glucosidase complexes 
were separated using an AKTA purifier system with a Superose 6 Tricorn column 
(10/300 GL, GE Healthcare Life Sciences). Samples were centrifuged at 9,000g for 
5min at 4°C before loading. Running buffer contained 50mM HEPES-KOH, 
pH7.5, 150 mM KCl, 5mM MgCl, 2mM DTT (0.2 mM DTT for «-glucosidase) 
and 10% glycerol. A similar approach was used to separate *H-labelled J-protein 
dimers, with the exception of using a buffer with 50 mM KCl. The °H signal in each 
fraction (500 pl) was quantified by scintillation counting (Beckman LS6000 IC). 
The amount of *H-luciferase trapped in GroEL?®”* was calculated by subtracting 
the total counts between elution volumes 11 and 16 in reactions without the trap 
from that of the reaction containing the trap (10 1M). The °H signal in each elution 
fraction was normalized to the total counts of the corresponding SEC run after 
background subtraction, and presented as a percentage of the total counts (F1-F4). 
A SEC standard (BIO-RAD) was used to determine the size of the elution peaks. 
Void volume contains any complexes =5,000 kDa. Notably, ~40-50% of the input 
material was lost during SEC as a result of nonspecific binding to filters and 
column matrix. 

Luciferase aggregation prevention assay. In the aggregation prevention assay, 
200 nM luciferase was mixed with indicated concentrations of chaperones or BSA 
(control) in a buffer containing 50 mM HEPES-KOH, pH 7.5, 50mM KCl, 5mM 
MgCl, 2 mM DTT and 2 mM ATP, pH 7.0. Aggregation of luciferase was initiated by 
increasing the temperature to 42 °C. The extent of luciferase aggregation was mon- 
itored by light scattering at 600 nm (Hitachi Fluorometer F4500, Aex/em = 600 nm, 
slit widths of 5.0 nm) for 25 or 30 min. 

Chemical crosslinking coupled to mass spectrometry. For chemical cross-link- 
ing, 100 pl of sample containing 2 1M JA2 and 2 uM JB1 was directly cross-linked 
with 1mM disuccinimidyl suberate d0/d12 (DSS, Creativemolecules Inc.), and 
subsequently enzymatically digested with trypsin and enriched for cross-linked 
peptides, essentially as previously described*. Liquid chromatography-tandem 
mass spectrometry (LC-MS/MS) analysis was carried out on an Orbitrap Elite 
mass spectrometer (Thermo Electron). Data were searched using xQuest** in 
iontag mode with a precursor mass tolerance of 10 p.p.m. For matching of frag- 
ment ions, tolerances of 0.2 Da for common ions and 0.3 Da for cross-link ions 
were used. False discovery rates of cross-linked peptides were assigned using 
xProphet”’. Cross-linked peptides were identified with a delta score <0.95 and a 
linear discriminant score >20, and additionally analysed by visual inspection to 
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ensure good matches of ion series on both cross-linked peptide chains for the most 
abundant peaks. 

FIAsH and ReAsH labelling of J-proteins. J-protein variants with introduced 
tetracysteine motif (CCPGCC, for FIAsH and ReAsH labelling) or with single 
cysteine residue (for JB1 Alexa Fluor 488 labelling) were reduced with 50-fold molar 
excess of TCEP for 30 min at room temperature, and incubated with FIAsH or 
ReAsH (gift from A. Krezel) at a 1:1.5 protein/label ratio for 4h at 4 °C, or 20-fold 
excess of Alexa-Fluor-488-maleimide for 2h at room temperature. Progress of 
labelling reaction for the biarsenical dyes was monitored in a spectrophotometer, 
using Azgonm (JA2 &2gonm = 24,000 M7! cm™', JBI ég0nm = 19,035 M~' cm™!) 
for protein concentration and absorption at Asionm for FIAsH (és;onm= 
41,000 M_'cm7!), Asoonm for ReAsH (é599 nm = 68,000 M ‘cm !) and Agoanm 
for Alexa Fluor 488 (é494 nm = 71,000 M'cm7'). The excess of unbound dye was 
removed on a self-packed Sephadex G-25 column (GE Healthcare), and the activ- 
ity of labelled proteins was confirmed by luciferase refolding assay. 

FRET measurements. FRET was used to validate distances between labelled 
J-domains and CTDs of class A and B J-proteins. Emission spectra were recorded 
on a Jasco FP6500 spectrofluorimeter between 510 and 650nm, at excitation 
wavelength of 508 nm for FIAsH and 488 nm for Alexa Fluor 488 (donor fluor- 
ophores). Quenching of donor fluorescence (at 519 nm for Alexa Fluor 488 and 
533 nm for FIAsH) and an increase in acceptor emission (at 608 nm for ReAsH) 
were quantified. Acceptor fluorescence measurement was refined by subtracting 
the fluorescence from donor-labelled J-protein to minimize background. The 
Forster radius of the FIAsH-ReAsH FRET pair was calculated to 39A (refs 36, 
37) and Alexa-Fluor-488-ReAsH to 62A (ref. 38). For FRET experiments, 
J-proteins labelled with donor and acceptor fluorophores were mixed at 0.1 1M 
(donor) and 1 1M (acceptor) in a buffer containing 25 mM HEPES, pH 7.5, 50 mM 
KCl and 5mM MgCl,, and allowed to equilibrate for 15min at 30°C before 
measuring the steady-state fluorescence. For competition experiments, 1-, 5- 
and 10-fold excess (relative to acceptor concentration) of unlabelled full-length 
proteins or isolated J-domains were added and allowed to equilibrate for 15 min at 
30°C. All samples were measured at least in duplicate. For Figs 2b and 3d, FRET 
efficiencies were calculated based on the donor fluorescence quenching, and pre- 
sented as a percentage of donor fluorescence in the absence of acceptor. 

Protein structure preparation. Protein structures used in simulations were either 
crystal or NMR structures from the RCSB Protein Data Bank (PDB; http:// 
www.rcsb.org) or comparative models that were either present in the SWISS- 
MODEL database and found using the Protein Model Portal (PMP) or were 
modelled with SWISS-MODEL (SM)*?~’. The structure of the CTD/®! dimer 
was taken from the crystal structure (PDB code 3AGZ, resolution: 2.51 A)*® and 
that of JD!®! from the first entry of the NMR structure (PDB code 1HDJ)**. Since 
the N-terminus of chain B in the CTD!” dimer was missing three residues com- 
pared to chain A, the N-terminal nine residues from chain A were superimposed 
on the N-terminus of chain B to obtain coordinates for the missing three residues. 
Comparative models of the CTD!™4 dimer and JD/™* were both found in the PMP 
and were based on the template structures 3AGZ and 1HDJ, respectively. To add 
the missing three residues at the N-terminus of chain B, the same procedure as for 
CTD!®! was used. Comparative models of CTD monomers of JA1 and JA2 were 
both taken from the PMP. Both structures were modelled with SM based on the 
template crystal structure, 1NLT (resolution 2.70 A)4, The structure of JD“! was 
the first entry of the NMR structure, 2LO1 in the RCSB PDB. A comparative model 
of JD“? was taken from the PMP and was based on the template structure, 2LO1. 
Structures of the CTD dimer were generated for JA1 and JA2 as follows: the 
dimerization site was modelled with SM based on the template crystal structure, 
1XAO (resolution 2.07 A). Then, the structures of the CTD monomers were super- 
imposed on the corresponding dimerization site model and only C-terminal miss- 
ing residues of the dimerization site were added to the CTD domains. The 
structure of the J-domain of DNJ-12 was taken from the crystal structure PDB 
code 2OCH (resolution 1.86 A). The CTD dimer of DNJ-12 was modelled based 
on the crystal structure, INLT, using SM. The J-domain of DNJ-13 was modelled 
based on 1HDJ and the dimer structure of the CTD of DNJ-13 was based on 3AGZ 
A/B. Further editing of the following structures was performed to generate a set of 
comparable structures of the J-proteins. N-terminal Gly was deleted in JD/*’, 
because it was not part of the UniProt entry P31689. The last seven residues of 
JD’®! were deleted to have a comparable C-terminal end to the J-domains of JA1 
and JA2. Similarly, the last four C-terminal residues in JD! 54 Were deleted to obtain 
comparable C-terminal ends to the J-domains of JA1 and JA2. 

Protein-protein docking. Protein—protein docking was performed with a rigid- 
body treatment of the protein structures using the Simulation of Diffusional 
Association (SDA) program (version 7, http://mcm.h-its.org/sda7)*“°. SDA uses 
Brownian dynamics (BD) simulation to perform the sampling of protein config- 
urations subject to inter-protein forces and torques due to electrostatic and non- 
polar interactions. The docking protocol required the following steps: 
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Structure preparation: polar hydrogen atoms were added to the protein 
structures with WHATIF5 (ref. 47) assuming pH 7.2. 

Calculation of protein electrostatic potentials: the electrostatic potential 
of each protein structure was calculated by numerically solving the linear- 
ized Poisson—Boltzmann equation with UHBD (ref. 48). Electrostatic 
potential grids with 250° grid points with a 1A spacing were used for all 
proteins. The dielectric constants of the solvent and the protein were 
set to 78.0 and 4.0, respectively, and the dielectric boundary was 
defined by the protein’s van der Waals surface. The ionic strength was set 
to 50 mM at a temperature of 300K, with an ion exclusion radius (Stern 
layer) of 1.5 A. The protein atoms were assigned OPLS atomic partial 
charges and radii”. 

Calculation of effective charges: these were derived with ECM”. The effec- 
tive charges for each protein were fit to reproduce the electrostatic 
potential in a 3-A-thick layer extending outwards from the protein’s 
solvent-accessible surface computed as defined by a probe of radius 4A. 
The effective charges for proteins were placed on the carboxylate oxygen 
atoms of Asp and Glu amino acid residues and the C-terminus, and the 
amine nitrogen atoms of Lys and Arg amino acid residues and the 
N-terminus. For the Zn?* ion, an effective charge site with a formal charge 
of —2e was placed on the ion, corresponding to the summed charge of the 
ion and its four coordinating cysteine side-chains. 

Calculation of polar desolvation grids: the desolvation penalty of each effec- 
tive charge was computed as the sum of desolvation penalties due to the low 
dielectric cavity of each atom of the other protein®', which was precomputed 
ona grid. The grid dimensions were set to 150° grid points with a spacing of 
1A. Ionic strength and dielectric constants were assigned as for the electro- 
static potential calculations. The ion radius was assigned as 1.5 A. 
Calculation of non-polar desolvation grids: the non-polar desolvation 
forces were computed using precomputed grids. The distance parameters 
(a) and (b) were assigned values of 3.10 A and 4.35 A, respectively. The 
parameter (c) was assigned as 1.0 and the conversion factor to 
f = —0.0065 kcal mol”! A~?. The grid dimensions were set to 150° grid 
points with a spacing of 1 A. 

Calculation of excluded volume grids: protein shape was described by a grid 
with a 0.25 A spacing. A probe of radius of 1.77 A was used to determine the 
protein shape. The radius of the solvent probe to determine surface atoms 
was set to 1.4 A. 

Docking simulations: for each protein pair docked, 10,000 trajectories were 
generated with SDA. Trajectories were started with the proteins at a sepa- 
ration distance of 100 A and a random relative orientation. A trajectory was 
terminated if the protein separation exceeded 300 A or a simulation time of 
500 ns was reached. The protein-protein separation was calculated as the 
distance between their centres of geometry (CoG). Up to 3,000 configura- 
tions sampled during the BD trajectories with a separation of less than 105 A 
were saved. During the BD simulations, if a new docking pose was consid- 
ered similar to a previously saved pose, that is, had an approximate root 
mean squared deviation (r.m.s.d.) less than 2 A, then the configuration with 
the lower intermolecular energy was saved and the counter of this docking 
pose, the occupation, was incremented. The relative translational diffusion 
coefficient was set to 0.027 A? ps !. The rotational diffusion coefficient for 
both proteins was set to 3.92 X 10* radian” ps. The time step was 1 ps at 
separations less than 120A and increased linearly beyond this threshold 
with slope of 2psA7!, 

Clustering: for each protein pair docked, the configurations saved were 
clustered with a hierarchical clustering algorithm. The backbone r.m.s.d. 
between each docked protein configuration was calculated to produce an 
inter-configuration distance matrix. Initially, each docked structure was 
assigned to a separate cluster. The closest clusters were found and merged; 
the distance matrix was updated. This process was repeated until all docked 
protein structures were in one cluster. The distance between clusters was 
defined as the average backbone r.m.s.d. between docked protein structures 
in one cluster relative to structures in another cluster. The representative ofa 
cluster is the protein configuration with the smallest r.m.s.d. to every other 
member of the cluster. In each clustering cycle, the mean and s.d. of the 
r.m.s.d. of all members of each cluster to the corresponding cluster repres- 
entative were calculated. The number of configurations in each cluster in 
each clustering cycle was determined, taking account of cluster occupation 
during the BD simulations, and the clusters were ranked by size. The num- 
ber of generated clusters was chosen using the following criteria. Starting 
with the largest cluster, the minimum number of clusters accounting 
for 90% of the total number of configurations docked and satisfying the 
criterion that the mean r.m.s.d. plus s.d. of the clusters is less than 10 A, was 
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determined. This threshold results in configurations with similar CoG of the 
J-domains but orientations differing by about 90° being assigned to differ- 
ent clusters. 

Of note, class B CTD dimers mostly showed a higher number of selected and 
total clusters with less favourable interaction energies than class A CTD dimers. 
This indicates a higher diversity in the docking poses of J-domains to class BCTD 
dimers (see clustering at N-termini of class B CTD dimers, Extended Data Fig. 7). 
However, in a full-length protein, this clustering region would be occupied by 
glycine-phenylalanine-rich linkers and J-domains as in the crystal structure of 
class B DnaJ2 from Thermus thermophilus (PDB code 4J80)**. Consistently, 
a docking simulation performed with full-length DnaJ2 showed a much more 
specific interaction of the DnaJ2 J-domain centering between CTD-I and CTD- 
II regions (data not shown). To take this into account, we also analyzed the dock- 
ing clusters for JD“? and CTD!™' by requiring docking positions to satisfy distance 
requirements from cross-linking data (see Fig. 2d and Extended Data Fig. 7). 
Modelling of JA2 and JB1 complexes. Using PyMOL (http://www.pymol.org) 
software, two putative arrangements (compact and open) of the JA2 and JB1 CTD 
homodimers were generated to satisfy the maximum number of observed cross- 
links and FRET constraints possible. The J-domains of JA2 and JB1 were added 
ensuring consistency with docking, cross-linking and FRET results and a distance 
to the corresponding CTD that would be allowed by the missing residues that 
connect J-domain and CTD. The positions of the J-domains of JB1 and JA2 in the 
two models are expected to provide a FRET efficiency of 8 and 3% (within experi- 
mental error), correlating with the lack of experimental observation of FRET 
between these domains. Note that the structures were treated as rigid bodies 
and flexibility of the CTD dimers parallel and perpendicular to the dimer plane 
is very likely and would allow other configurations of these complexes. 
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Extended Data Figure 1 | Characterization of protein disaggregation/ 
refolding and refolding-only reactions. a, HSP70-J-protein-HSP110 
(HSPA8-J-protein-HSPH2) functional cycle. Concomitant interaction of 
HSP70 with a J-protein and substrate results in allosteric stimulation of ATP 
hydrolysis; this traps the substrate in HSP70 (ref. 8). Subsequent NEF (for 
example, HSP110) promoted ADP dissociation from HSP70, then allows ATP 
rebinding, which triggers substrate release to complete the cycle****. b, Scheme 
for in vitro disaggregation/refolding and refolding-only reactions. The 
aggregates used in disaggregation/refolding assays are preformed by heating 
luciferase with yeast small heat-shock protein (sHSP) Hsp26 (ref. 4), which is 
known to co-aggregate with misfolded proteins in vivo’””* (see Methods for 
detailed description). If HSP70, J-protein and HSP110 are instead heated 
together with substrate and Hsp26, luciferase is denatured into a more easily 
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refoldable, inactive and largely monomeric substrate form used in refolding- 
only assays. c, SEC profiles of aggregated *H-labelled luciferase (black; size 
range 200 kDa to =5,000 kDa representing ~2 to >50 aggregated luciferase 
molecules) and monomeric native luciferase (red; size ~63 kDa). Arrows 
indicate elution size (kDa). Inset, activity of loaded material. d, SEC profile of 
partially denatured and largely monomeric luciferase (starting material for 
refolding-only reactions). Inset, activity of loaded material. e, Chaperone 
nomenclature. f, Disaggregation and reactivation of preformed luciferase 
aggregates using human HSP70-HSP110 with human J-proteins JA2, green; 
JB1, blue; JA2+JB1, magenta or no J-protein, black. Under limiting chaperone 
(HSP70/HSP110) and increasing J-protein concentrations (A, solid or B, 
dashed) (n = 3). Data are mean + s.e.m. Precise concentrations are shown in 
Extended Data Table 1. 
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Extended Data Figure 2 | Effects of mixed-class J-proteins on 
disaggregation/refolding and refolding-only activity of the HSP70 system. 
a, Disaggregation/refolding of aggregated luciferase compared for human 
class A (JA1 and JA2) and class B (JB1 and JB4) J-proteins (n = 3). b, Luciferase 
refolding-only compared for JA1, JA2, JB1 and JB4 (n = 3). c, Reactivation 
of heat-aggregated luciferase with nematode HSP70 machinery, using 
reduced substrate:HSP70 ratio of 1:20, containing DNJ-12 (A), DNJ-13 (B) or 
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DNJ-12+DNJ-13 (A+B) (n = 2). d, Disaggregation/refolding of luciferase 
using human HSP70 and HSP110 combined with nematode J-proteins (n = 3). 
e, Reactivation of luciferase showing optimal JA2:JB1 ratio for disaggregation/ 
refolding (n = 2). f, Initial disaggregation/refolding rates for e. g, Final yields of 
refolded luciferase (120 min) for e. Data are mean + s.e.m. Precise 
concentrations are shown in Extended Data Table 1. 
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Extended Data Figure 3 | Disaggregation synergy is independent of sHSP 
incorporation, NEF, substrate and aggregate character, and is not explained 
by sequential J-protein class activity. a, Disaggregation/refolding reaction for 
luciferase aggregates without incorporating sHSP Hsp26 (n = 3). 

b, Reactivation without NEF (HSPH2) (n = 3). ¢, Reactivation of o-glucosidase 
aggregates (n = 3). d, Reactivation of preformed MDH aggregates in the 
presence of GroEL plus the GroES protein foldase system (GroELS) (n = 2). 
GroELS is required for efficient MDH refolding’’. GroELS alone is in black. 
JB1:JA2 denotes the stoichiomety of each reaction. e, Disaggregation/refolding 
of stringent aggregates (=5,000 kDa) formed using 2 [1M luciferase (n = 3). 
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f, Disaggregation/refolding of aggregated luciferase at reduced substrate: HSP70 
ratio (luciferase: HSP70:J-protein: HSP110 = 1:7.5:3.8:0.4) (n = 3). The 
aggregated luciferase concentration is 100 nM. g, h, Holdase function of 
J-proteins (class A (g) and class B (h)) during luciferase aggregation at 42 °C, 
shown by decreased light scattering. Concentrations: 1X luciferase; 4x 
J-protein; 4X BSA (control) (m = 2). i, Reactivation with sequential JA2 and JB1 
addition. J-protein added at t = 0 min (black graph legends); J-protein added 
after 30 or 60 min (red graph legends and arrows) (n = 2). Data are 

mean ~ s.e.m. Precise concentrations are shown in Extended Data Table 1. 
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Extended Data Figure 4 | Stoichiometry of class A and B J-proteins 
determines the range of aggregate sizes resolved. a, The GroEL trap 
(GroEL”*”®) facilitates the capture of *H-luciferase monomers liberated by 
protein disaggregation before the refolding step. b, Refolding of disaggregated 
3H-luciferase monomers (40 min) in the absence (solid bars) and presence of 
GroEL trap (open bars). ¢, SEC profile after disaggregation/refolding of 
aggregated tritiated o-glucosidase (60 min) with either J-protein class alone 
(green (A) or blue (B)) or J-proteins combined (magenta). Control reaction 
without chaperones (black). Elution fractions F1-F4 (red lines). Table shows 
size distribution of aggregates in each fraction; F1 luciferase aggregates 
24,000 kDa; F2, aggregates ~400-4,000 kDa; F3, aggregates ~ 150-400 kDa, 
F4 disaggregated monomers (~68 kDa). d, Quantification of SEC profile 


measuring disaggregation of tritiated «-glucosidase from aggregates (F1-F3) 
from c, also showing concomitant accumulation of disaggregated monomer 
(F4) from c (n = 3). e, ATP depletion by apyrase abrogates disaggregation. 

f, Quantification of SEC profile measuring disaggregation of tritiated luciferase 
from aggregates (F1-F3) with concomitant accumulation of disaggregated 
monomer (F4), using the HSP70-HSP110 system with JA2 or JB1 alone, or 
with JA2 plus JB1. Stoichiometry range used for JA2:JB1, 1:1 to 4:1 to 1:4. 
Specifically, 0.2 JB1:0.8 JA2 (orange); 0.2 JA2:0.8 JB1 (red). Solid colours denote 
40-min reaction time; hash denotes 120 min. Control reaction without 
chaperones (black). Two-tailed t-test, *P < 0.05, **P< 0.01 (n = 3). Data are 
mean + s.e.m. Precise concentrations are shown in Extended Data Table 1. 
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Extended Data Figure 5 | JA2 and JB1 form homodimers and interact b, Representative mass spectrometry spectra for inter-molecular JA2 and JB1 
transiently. a, Identified JA2 and JB1 inter-molecular cross-links; ‘Id’, amino _ cross-links. Common peaks, green; cross-linked, red; matched peaks, diamonds 
acid sequence of peptides showing cross-linked lysines (K, orange). Protein 1 (no peaks above 1,100 m/z detected). c, SEC profiles of *H-labelled JA2 

and 2 denote source proteins for cross-linked peptides; position 1 and2 denote dimer (green cartoon) and *H-labelled JB1 dimer (blue cartoon) mixed with 
positions of cross-linked lysines within proteins; deltaS is the delta score for unlabelled J-protein from the other class. Precise concentrations are shown in 
each crosslink; cut-off = 0.9. ld-Score is the linear discriminant score. Extended Data Table 1. 
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Extended Data Figure 6 | Electrostatic interactions between J-domain and _JA2 and JBI interactions at increasing salt concentrations. d, Disaggregation/ 
CTD predominate in JA2 and JB1 complexes. a, FRET efficiencies for refolding of preformed luciferase aggregates by JA2 and JB1 with increasing salt 
JD-CTD and CTD-CTD interactions with 0-0.2% Tween20 titration. concentrations; control, 50 mM salt, no chaperones (n = 2). e, Luciferase 
Percentage efficiency is relative to untreated (0% Tween20) samples. Donor disaggre ation/refolding i in the presence of excess J-domain fragments carrying 
quenching (black); acceptor fluorescence (red); below, fluorophore positions in JD-QPN!*?®! mutation of the HPD motif (n = 3). f, g, FRET between class 
J-protein protomers (JA2, green; JB1, blue). N-termini of JD’? and JD!®! A and B J-proteins. f, Competition with unlabelled full-length wild-type 
labelled with acceptor fluorophore ReAsH. CTD“? and CTD? labelled with —_J-protein (FL); unlabelled competitor is 1-10 acceptor; (—), no competitor. 
donor fluorophores FIAsH and Alexa Fluor 488 at residues 241 and 278, g, Competition with unlabelled isolated JD!” and JD’®’. Data are 
respectively. b, Disaggregation/refolding of preformed luciferase by JA2 and/or mean + s.e.m., average of at least two experiments for FRET experiments. 
JB1 with increasing amounts of Tween20 (n = 2). c, FRET efficiencies for Precise concentrations are shown in Extended Data Table 1. 
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Extended Data Figure 7 | In silico prediction of JD-CTD interactions 
between class A and B J-proteins and in vitro evidence that physical 
interactions between J-proteins do not overlap J-protein substrate binding 
sites. a, Preferred positions of the centres of geometry (CoG) of J-domains 
(y axis, JA1, JA2, JB1 and JB4) around CTD dimers (x axis, class A, green, class 
B, blue) obtained from molecular docking simulations. JD'“\)®) wireframe 
meshes; joa 54 brown contours, each contoured at the isovalue given in the 
top left of each image. The higher scores for class A CTDs indicate greater 
specificity of the complexes formed with J-domains; the lower scores for class B 
CTDs indicate much less specific interactions. Lysines in inter- and 
intra-J-protein JA2-JB1 cross-links, orange spheres. b, Properties of the 
docking arrangements obtained after clustering. Total number of clusters per 
simulation, denominator; number of selected clusters (corresponding to 90% of 
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all docked complexes), numerator, bold. In parentheses, the range of average 
energy values (in units of kT) for the selected clusters. Lower energy values 
indicate more favourable binding; fewer clusters indicate a more defined 
binding mode (see Methods). JD” docking to CTD" is much weaker and less 
specific than JD/*' docking to CTD™’, but docking arrangements compatible 
with cross-linking results still obtain (Fig. 2d). c, Competition of isolated JD!" 
fragments against JA2 holdase function in luciferase aggregation at 42 °C 

(n = 2). d, Competition of isolated JD! fragments against JA2 holdase 
function (n = 2). Luciferase, 1X; JA2, 4X; isolated J-domain fragments, 20x 
(red; 5-fold excess over JA2), or 40X (orange; 10-fold excess over JA2). Light 
scattering measured at 600 nm. Precise concentrations are shown in Extended 
Data Table 1. 
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Extended Data Figure 8 | Possible configurations of the JA2-JB1 mixed- 
class complex. a, Compact configuration. b, Open configuration. 
Configurations were derived from computational docking, using constraints 
from experimental FRET and cross-linking data (Fig. 2a-d and Extended Data 
Fig. 5a). Each configuration is shown from two views (left and right) rotated by 
135 degrees with respect to each other and in ribbon (top) and molecular 
surface (bottom) representations. In both cases J-domains of JA2 dock onto the 
CTD dimer of JB1, and similarly J-domains of JB1 dock to the CTD dimer of 
JA2. Both CTD! protomers are within cross-linking distance of CTD”. 
Unstructured glycine/phenylalanine (G/F)-rich flexible regions connecting 
J-domains and CTDs shown by dark blue (JB1) or green (JA2) dashed lines. 
Residues at FRET fluorophore sites are shown in space-filling representation 
(red on JA2, magenta on JB1). Inter-molecular crosslinking lysine pairs (gold 
and cyan, space-filling) are connected by dotted lines. Bottom left within 
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a: molecular surface representation of compact configuration of the JA2-JB1 
complex, showing substrate binding sites from crystallographic” (yellow) 
and biochemical” (orange, cyan) data. HPD motif, red. Residues implicated in 
JD-HSP70 interactions'®»** (dark teal and dark green on JD!?; purple and 
dark blue on JD"). Bottom right within a: rotated image. Table shows 
fluorophore separation distances; calculated percentage FRET efficiencies in 
parentheses. a, Both CTD!™' protomers are within cross-linking distance of 
CTD’. b, Asina, but with onlya single CTD! protomer within cross-linking 
distance to CTD“; one JD“? docks onto CTD!"!, the other JD“? is free. 
Similarly, one JD!™! docks onto CTD!*?, the other JD!®! docks onto its own 
CTD, consistent with SAXS-determination of class B J-proteins'*’”. Model of 
JB1 (blue) based on the crystal structure of CTD and NMR structure of 
J-domain. Homology model of JA2 (green) based on the crystal structure of 
Ydj1 (see Methods). 
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Extended Data Table 1 


Panel Heat treatment 
time/ Temp 
Fig.1b 15 min/ 45°C 
Fig.1c 15 min/ 45°C 
Fig.1d 15 min/ 45°C 
Fig.1e,f 15 min/ 45°C 
Fig.1g 15 min/ 45°C 
Fig.2e 15 min/ 45°C 
Fig.2f 10 min/ 42°C 
Fig.3e 15 min/ 45°C 
Fig.3f 10 min/ 42°C 
Panel Heat treatment 
time/ Temp 
EDFig.1c 15 min/ 45°C 
EDFig.1d 10 min/ 42°C 
EDFig.1f 15 min/ 45°C 
EDFig.2a 15 min/ 45°C 
EDFig.2b 10 min/ 42°C 
EDFig.2c 15 min/ 45°C 
EDFig.2d 15 min/ 45°C 
EDFig.2 15 min/ 45°C 
e,f.g 
EDFig.3a 15 min/ 45°C 
EDFig.3b 15 min/ 45°C 
EDFig.3e 15 min/ 45°C 
EDFig.3f 15 min/ 45°C 
EDFig.3i 15 min/ 45°C 
EDFig.4b 15 min/ 45°C 
EDFig.4e 15 min/ 45°C 
EDFig.4f 15 min/ 45°C 
EDFig.5c NA 
EDFig.6b 15 min/ 45°C 
EDFig.6d 15 min/ 45°C 
EDFig.6e 15 min/ 45°C 
Panel Heat treatment 
time/ Temp 
EDFig.3c 15 min/ 50°C 


EDFig.4c,d 15 min/ 50°C 


Heat treatment 
time/ Temp 


Panel 


EDFig.3d 30 min/ 47°C 


Tabulated reaction conditions 


Luciferase/ Hsp26 rxn concentrations 
(aggregating or denaturing conditions) 


Chaperone mixture 


20 nM/ 100 nM (25 nM/ 125 nM) 
20 nM (25 nM) 
20 nM/ 100 nM (25 nM/ 125 nM) 


20 nM/ 100 nM (25 nM/ 125 nM) 
20 nM/ 100 nM (25 nM/ 125 nM) 


20 nM/ 100 nM (25 nM/ 125 nM) 
20 nM/ 100 nM (20 nM/ 100 nM) 


20 nM/ 100 nM (25 nM/ 125 nM) 
20 nM/ 100 nM (20 nM/ 100 nM) 


Luciferase/ Hsp26 rxn concentrations 
(aggregating or denaturing conditions) 


20 nM/ 100 nM (25 nM/ 125 nM) 
20 nM/ 100 nM (20 nM/ 100 nM) 
20 nM/ 100 nM (25 nM/ 125 nM) 


20 nM/ 100 nM (25 nM/ 125 nM) 
20 nM/ 100 nM (20 nM/ 100 nM) 
20 nM (25 nM) 

20 nM/ 100 nM (25 nM/ 125 nM) 
20 nM/ 100 nM (25 nM/ 125 nM) 


20 nM (25 nM) 
20 nM/ 100 nM (25 nM/ 125 nM) 
20 nM/ 100 nM (2 uM/ 10 uM) 


100 nM/ 500 nM (100 nM/ 500 nM) 


20 nM/ 100 nM (25 nM/ 125 nM) 


20 nM/ 100 nM (25 nM/ 125 nM) 


20 nM/ 100 nM (25 nM/ 125 nM) 
20 nM/ 100 nM (25 nM/ 125 nM) 


NA 


20 nM/ 100 nM (25 nM/ 125 nM) 
20 nM/ 100 nM (25 nM/ 125 nM) 
20 nM/ 100 nM (25 nM/ 125 nM) 


a-glucosidase/ Hsp26 rxn 


concentrations (aggregating conditions) 


40 nM/ 400 nM (50 nM/ 500 nM) 
40 nM/ 400 nM (50 nM/ 500 nM) 


malate dehydrogenase/ Hsp26 rxn 
concentrations (aggregating conditions) 


150 nM/ 750 nM (150 nM/ 750 nM) 


Panel Light scattering Luciferase 
EDFig.3g,h 600 nm 200 nM 
EDFig.7c,d 600 nm 200 nM 


HSP70 [2 uM], NEF [0.1 MI], J-protein [1 uM], 0.5 J-protein [0.5 uM] 
HSP-1 [8 uM], NEF [4 uM], J-protein [4 uM], 0.5 J-protein [2 uM] 

HSP70 [750 nM], NEF [40 nM], GroEL°87K [10 uM], J-protein [380 nM], 
0.5 J-protein [190 nM] 

HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 0.5 J-protein [190 nM] 
HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 0.5 J-protein [190 nM] 
HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 0.5 J-protein [190 nM], 
1x JD [380 nM], 5x JD [1.9 uM], 10x JD [3.8 pM] 

HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 0.5 J-protein [190 nM], 
1x JD [380 nM], 5x JD [1.9 pM], 10x JD [3.8 uM] 

HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 0.5 J-protein [190 nM] 
HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 0.5 J-protein [190 nM] 


Chaperone mixture 


NA 
HSP70 [750 nM], NEF [40 nM], 0.5 JA2 [190 nM}, 0.5 JB1 [190 nM] 


HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 1/2x J-protein [190 nM], 
3x J-protein [1140 nM], 6x J-protein [2280 nM], 9x J-protein [3420 nM] 


HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 0.5 J-protein [190 nM] 
HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 0.5 J-protein [190 nM] 
HSP-1 [400 nM], NEF [20 nM], J-protein [200 nM], 0.5 J-protein [100 nM] 
HSP70 [3 uM], NEF [0.1 uM], J-protein [1 yM], 0.5 J-protein [0.5 uM] 
HSP70 [750 nM], NEF [40 nM], 100% J-protein [380 nM] 


HSP70 [750 nM], NEF [40nM], J-protein [380 nM], 0.5 J-protein [190 nM] 
HSP70 [750 nM], J-protein [380 nM], 0.5 J-protein [190 nM] 

HSP70 [2 uM], NEF [0.1 uM], J-protein [1 yM], 0.5 J-protein [0.5 uM] 
HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 0.5 J-protein [190 nM] 
HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 0.5 J-protein [190 nM] 


HSP70 [750 nM], NEF [40 nM], GroEL?87K [10 uM], J-protein [380 nM], 
0.5 J-protein [190 nM] 


HSP70 [750 nM], NEF [40 nM], 0.5 J-protein [190 nM] 

HSP70 [750 nM], NEF [40 nM], J-protein [380 nM], 0.5 J-protein [190 nM], 
0.2 J-protein [76 nM], 0.8 J-protein [304 nM] 

J-protein [190 nM] 

HSP70 [750 nM], NEF [40 nM], 0.5 J-protein [190 nM] 

HSP70 [750 nM], NEF [40 nM], 0.5 J-protein [190 nM] 

HSP70 [750 nM], NEF [40 nM], 0.5 J-protein [190 nM], 

1x JD [380 nM], 5x JD [1.9 pM], 10x JD [3.8 M] 


Chaperone mixture 


HSP70 [3 uM], NEF [0.1 uM], J-protein [1 pM], 0.5 J-protein [0.5 uM] 
HSP70 [3 uM], NEF [0.1 pM], J-protein [1 yM], 0.5 J-protein [0.5 uM] 


Chaperone mixture 


HSP70 [2 uM], NEF [0.1 pM], 1 J-protein [0.063 uM], 2 J-protein [0.126 uM], 


4 J-protein [ 0.252 uM], GroEL [1 uM], GroES [1 uM] 


Chaperone mixture 


4x J-protein [800 nM], 4x BSA (control) [800 nM] 
4x J-protein [800 nM], 20x JD [4 uM], 40x JD [8 pM] 


Precise concentrations and conditions tabulated for reactions in Figures as indicated in first column (Panel). EDFig., Extended Data Figure. 
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Reaction time/ 
Temp 


120 min/ 30°C 
120 min/ 20°C 
40 min/ 30°C 


120 min/ 30°C 
40 min/ 30°C 


120 min/ 30°C 
80 min/ 30°C 


120 min/ 30°C 
80 min/ 30°C 


Reaction time/ 
Temp 

0 min 

0 min 

120 min/ 30°C 


120 min/ 30°C 
80 min/ 30°C 

120 min/ 20°C 
120 min/ 22°C 
120 min/ 30°C 


120 min/ 30°C 
120 min/ 30°C 
120 min/ 30°C 
120 min/ 30°C 
120 min/ 30°C 


40 min/ 30°C 


240 min/ 4-30°C 
40-120 min/ 
30°C 

10 min/ 30°C 
120 min/ 30°C 
90 min/ 30°C 
120 min/ 30°C 


Reaction time/ 
Temp 


300 min/ 30°C 
60 min/ 30°C 


Reaction time/ 
Temp 


220 min/ 30°C 
Reaction time/ 
Temp 


30 min/ 42°C 
30 min/ 42°C 
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X-ray structure of a mammalian stearoyl-CoA 


desaturase 


Yonghong Bai’, Jason G. McCoy’, Elena J. Levin’, Pablo Sobrado”, Kanagalaghatta R. Rajashankar*, Brian G. Fox” & Ming Zhou! 


Stearoyl-CoA desaturase (SCD) is conserved in all eukaryotes and 
introduces the first double bond into saturated fatty acyl-CoAs’. 
Because the monounsaturated products of SCD are key precursors 
of membrane phospholipids, cholesterol esters and triglycerides, 
SCD is pivotal in fatty acid metabolism. Humans have two SCD 
homologues (SCD1 and SCD5), while mice have four (SCD1- 
SCD4). SCD1-deficient mice do not become obese or diabetic when 
fed a high-fat diet because of improved lipid metabolic profiles and 
insulin sensitivity”®. Thus, SCD1 is a pharmacological target in the 
treatment of obesity, diabetes and other metabolic diseases’. SCD1 
is an integral membrane protein located in the endoplasmic reticu- 
lum, and catalyses the formation of a cis-double bond between the 
ninth and tenth carbons of stearoyl- or palmitoyl-CoA*’. The reac- 
tion requires molecular oxygen, which is activated by a di-iron 
centre, and cytochrome b;, which regenerates the di-iron centre’®. 
To understand better the structural basis of these characteristics of 
SCD function, here we crystallize and solve the structure of mouse 
SCD1 bound to stearoyl-CoA at 2.6A resolution. The structure 
shows a novel fold comprising four transmembrane helices capped 
by a cytosolic domain, and a plausible pathway for lateral substrate 
access and product egress. The acyl chain of the bound stearoyl- 
CoA is enclosed in a tunnel buried in the cytosolic domain, 
and the geometry of the tunnel and the conformation of the bound 
acyl chain provide a structural basis for the regioselectivity 
and stereospecificity of the desaturation reaction. The dimetal 
centre is coordinated by a unique spacial arrangement of nine 
conserved histidine residues that implies a potentially novel 
mechanism for oxygen activation. The structure also illustrates 
a possible route for electron transfer from cytochrome b; to the 
di-iron centre. 

A A2-23 amino-terminal truncation of mouse SCD1 was crystal- 
lized in lipidic cubic phase'' and the structure was solved by single- 
wavelength anomalous dispersion (Extended Data Table 1). SCD1 has 
four transmembrane helices (TM1-TM4) arranged in a cone-like 
shape with TM4 sandwiched between TM1 and TM2 (Fig. 1a). 
Residues in the membrane-spanning region are largely hydrophobic, 
with the notable exception of a conserved arginine (Arg249; Extended 
Data Fig. 1) located on TM4 in the centre of the cone (Extended Data 
Fig. 2). Previous biochemical analysis'* determined that the amino and 
carboxy termini are on the cytosolic side of the membrane (Fig. 1b). 
On the cytosolic side, TM2 and TM4 protrude three helical turns out of 
the membrane and provide some of the coordinating residues for the 
dimetal active site. The cytosolic domain comprises 93 residues 
between TM2 and TM3 (C1) and the 90-residue C terminus (C2) 
(Fig. 2a). The Cl and C2 domains contain six and five a-helices, 
respectively. Three of the o-helices (AH1 on Cl and AH7 and AH9 
on C2) are amphipathic and probably reside at the interface between 
the cytosolic domain and the lipid bilayer (Fig. 1a and Extended Data 
Fig. 3). These amphipathic helices and the locations of hydrophobic 
residues on the transmembrane helices indicate the approximate posi- 
tion of the lipid bilayer (Fig. 1a). In the crystal lattice, the interacting 


surfaces between neighbouring molecules are small (Extended Data 
Fig. 4), and size-exclusion chromatography (SEC) of the detergent 
solubilized protein indicated that mouse SCD1 was stable as a mono- 
mer. By contrast, previous in vivo studies have shown that SCDs are 
dimers in the cellular membrane’. Whether this difference is a con- 
sequence of isolation of the enzyme remains to be determined. 

The cytosolic domain contains a substantial non-protein density 
consistent with an 18-carbon acyl-CoA molecule (Fig. 2a and 
Extended Data Fig. 5a). We modelled a stearoyl-CoA molecule into 
this density, although we were unable to distinguish between oleoyl- 
CoA and stearoyl-CoA solely from the crystallographic maps. The 
electron density for the CoA moiety is well resolved, and the CoA 
group interacts primarily with hydrophilic and charged residues on 
the outer surface of the C1 domain (Fig. 2b). The residues that form 
polar interactions with the CoA group in the mouse SCD1 structure 
are strongly conserved among known stearoyl-CoA desaturases, 
including human SCD1, but not among stearoyl-lipid desaturases 
(Extended Data Fig. 1). 

The acyl chain is enclosed in a long, narrow tunnel extending 
approximately 24 A into the mostly hydrophobic interior of the pro- 
tein. This tunnel is sharply kinked where it binds to C9 and C10 on 
stearoyl-CoA, the atoms involved in formation of the cis-double bond 
(Fig. 2c). The positioning of C9 and C10 by the kink is enforced by the 
shape complementarity of this substrate tunnel, by the location of the 
CoA binding site, and by a hydrogen bond between the Trp258 side 
chain and the acyl carbonyl (Fig. 2c). The kink in the tunnel is created 
by the side chains of two conserved residues, Trp149 and Thr257, 
which are stabilized by hydrogen bonds with conserved Gln143 
(Fig. 2c). We note that the narrow and kinked tunnel precisely posi- 
tions the acyl chain for A9-regioselective desaturation, an idea that was 
envisioned by Bloch over four decades ago®. 

Previous studies showed that rat SCD1 was effective on acyl chains 
containing between 14 and 19 carbons, and had the highest activity 
with substrates 17 to 19 carbons in length’. Regardless of the acyl chain 
length, the double bond is exclusively placed between C9 and C10, as 
enforced by the substrate-enzyme interactions described above. 
Residues that probably have a role in determining substrate length 
are found at the end of the substrate tunnel, which is capped by 
Tyrl04 on TM2 (Fig. 2c). There is a distance of 4.1 A between the 
end of the 18:0 acyl chain and the tyrosine hydroxyl oxygen, which 
agrees well with the observed preference of the enzyme for 18:0 (ref. 9). 
Tyr104 is highly conserved in animal SCD1. However, an atypical acyl- 
CoA desaturase (ChDes1) from the marine copepod Calanus hyper- 
boreus has a threonine at the position corresponding to Tyr104 in 
mouse SCD1 (ref. 14). ChDes1 preferentially acts on very long-chain 
fatty acyl-CoAs (22:0-26:0), but when this threonine was mutated to 
tyrosine, desaturation of 26:0 was lost while desaturation of 18:0 was 
retained!*. Another conserved residue, Ala108, is located one helical 
turn above Tyr104 facing the substrate tunnel (Fig. 2c). Desat2 from 
Drosophila melanogaster has a methionine at this position, and can 
only accept acyl substrates up to 14 carbons long’’. Combined, these 
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Figure 1 | Structure and topology of mouse SCD1. a, The crystal structure 
of mouse SCD1 is shown from two perpendicular orientations in the 
membrane, with the cytosolic side on top. Two zinc ions bound to the cytosolic 


observations suggest that the tunnel-facing residues 104 and 108 on 
TM2 are critical determinants of the substrate chain length. 

To explore the relationship between the structure of the substrate 
tunnel in mouse SCD1 and acyl chain selectivity further, we trans- 
formed yeast monounsaturated fatty acid auxotroph L8-14C with 
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domain are shown as grey spheres. b, Topology diagram of SCD1, with helices 
coloured by the same scheme as in a. Orange spheres represent conserved 
histidine residues involved in coordination of the dimetal centre. 


either mouse SCD1 or SCD3, which allowed growth in media lacking 
unsaturated fatty acids. Although SCD1 and SCD3 share 89% primary 
sequence identity, they yield remarkably different total fatty acid pro- 
files in the yeast host cells, probably reflecting differences in their 
preferences for reaction with 16:0 and 18:0 (Fig. 2e and ref. 16). In 


C16 or C18 total fatty acid (%) 
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Figure 2 | Architecture of the acyl-CoA binding site. a, Two views of the 
SCD1 structure with the C1 domain coloured teal and the C2 domain coloured 
magenta. The bound stearoyl-CoA is shown as yellow, red and blue sticks, and 
the two zinc ions as grey spheres. b, A close-up view of the CoA binding site 
shown as a surface representation (left), or with residues interacting with the 
CoA moiety shown as sticks (right). c, A close-up view of the substrate tunnel 
housing the acyl chain shown as a surface cross-section (top), or with residues 
forming the kink (Thr257, Gln143 and Trp149), hydrogen bonding to the 
acyl oxygen (Trp258), and capping the end of the substrate tunnel (Tyr104 and 
Alal108) shown as sticks (bottom). d, The locations of SCD3 mutations studied 
by yeast complementation experiments are mapped onto the SCD1 structure 


——_ 


as green spheres. Residue positions are labelled according to the SCD3 
sequence. e, Monounsaturated fatty acids in the total lipid of yeast L8-14C 
transformed with mouse SCD1, SCD3 or mutated SCD3 enzyme. SCD1 
produces a mixture of 16:1 and 18:1 fatty acids, while SCD3 produces 16:1 
nearly exclusively. The combination of Ile112Ala and Glu113Leu mutations 
converts SCD3 into an enzyme that yields a proportion of monounsaturated 
fatty acids indistinguishable from SCD1. The corresponding residues in SCD1, 
Ala108 and Leu109, lie at the end of the substrate binding tunnel. Error bars are 
s.d. of 3 technical replicates. f, Expression levels of the SCD3 mutants were 
similar as detected by western blotting. Uncropped gel is shown in Extended 
Data Fig. 6. 
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SCD1, Alal08, Leul09, Ala288 and Val289 line the distal end of the 
substrate binding channel, Ala115 is near the position of double bond 
formation, and Gln277 and Ser278 are on the cytoplasmic surface 
opposite to the CoA binding site. The corresponding residues in 
SCD3 are Ile112, Glu113, Ser292 and Met293, Vall19, and Asp281 
and Pro282, respectively (Fig. 2d). The stacked mutations Ile112Ala/ 
Glul13Leu were able to convert SCD3 from exclusively a 16:0 
desaturase into a predominantly 18:0 desaturase (Fig. 2e, f and 
Extended Data Fig. 6). The stacked mutations Vall 19Ala/Asp281Gln/ 
Pro282Ser, which are located away from the end of the substrate tunnel, 
caused no change in the reaction specificity. 

In addition to the bound stearoyl-CoA molecule, SCD1 also con- 
tains two metal ions. The metal ions in the current structure were 
identified as zinc by X-ray fluorescence, and by diffraction data col- 
lected at a wavelength near the zinc absorption edge that yielded two 
prominent anomalous difference peaks in each protein (Extended 
Data Fig. 5b-e). Incorporation of zinc instead of iron into the protein 
was probably an artefact of protein overexpression, and zinc remained 
the predominant metal species even when the growth media and puri- 
fication solutions were supplemented with iron. 

The dimetal cluster sits at the kink in the substrate tunnel adjacent to 
C9 and C10 on the substrate, where the double bond is introduced. 
Zinc 1 (M1) is positioned 5.2 A from C9, while zinc 2 (M2) is ATA 
from C10 (Fig. 3a). M1 and M2 are coordinated by four and five 
histidine residues, respectively, provided by the helices TM2, TM4, 
H2 and H8 (Fig. 3b and Extended Data Fig. 7a). The coordination of 
both zinc ions is consistent with an octahedral geometry with one 
missing ligand. The nine histidines are highly conserved (Extended 
Data Fig. 1), and eight of them belong to three histidine-containing 
motifs (two HXXHH motifs and one HX,H motif in SCD1) that are 
characteristic of integral membrane desaturases, alkane hydroxylases 
and xylene monooxygenases*’’. The predominance of histidine 
ligands in SCD1 is consistent with the assignment of nitrogen-rich 
ligation of the di-iron centre in alkane hydroxylase from Méssbauer 
isomer shifts'*. Mutation of any of these eight histidines into an alanine 
in rat SCD1 led to a nonfunctional enzyme'’. The ninth histidine 
(His265) is conserved in other SCDs but had not been previously 
identified. The water molecule coordinating M1 is hydrogen bonded 
to an asparagine residue, Asn261 on TM4. Interestingly, this aspara- 
gine and His265 belong to a NX3H motif that is symmetrically equi- 
valent to the HX,H motif interacting with M2 (Fig. 3c). Likewise, the 
two HXXHH motifs have symmetrical interactions with M1 and M2 
(Fig. 3c). 

One notable aspect of the structure is that the two metal ions are 
separated by 6.4A (Fig. 3a). This is longer than in any previously 
solved structures of soluble di-iron enzymes'?™, including the soluble 
plant acyl-ACP desaturases that catalyse the same reaction as SCD1. In 
these soluble enzymes, the two iron ions are bridged by a glutamate 
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Figure 3 | The dimetal centre. a, b, Two views of the dimetal centre and 
coordinating residues, marked with distances between the zinc ions and C9 and 
C10 on the substrate (a), and coordination distances (b). The zinc ions and an 
ordered water molecule are shown as grey and red spheres, respectively. See 
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residue with bidentate coordination (Extended Data Fig. 7b) that 
constrains the inter-iron distance to roughly 3-4A (refs 19-21), 
and permits formation of reaction intermediates such as cis-1-1,2 
peroxo”” and diferryl’*. In SCD1, there is no carboxylate coordina- 
tion between the metal ions. This is probably not an artefact caused by 
Zn°*, as the closest glutamate or aspartate residues are over 6 A away 
from the metal centres and provide hydrogen-bonding interactions 
to the metal-bound His residues (Extended Data Fig. 7a). Since 
Zn** has an ionic radius of 0.88 A, it may have served as a reasonable 
substitute for Fe?* (ionic radius 0.92 A) in terms of size and charge 
during heterologous expression. Notably, the bound Zn**, which 
often have tetrahedral coordination, have octahedral coordination 
in the SCD1 structure as typical for iron ions. Given the similar 
B-values for residues around the observed metal sites, the presence 
of stearoyl-CoA bound in an appropriately kinked configuration rela- 
tive to the metals, and the absence of reasonably positioned carbox- 
ylate residues that might serve as bridging ligands, we propose that the 
unfortunate presence of Zn” has not significantly altered the struc- 
ture and that the SCD1 structure indicates a new avenue for activation 
of O> in biological oxidation reactions. 

Two known aspects of the desaturase reaction are compatible with 
this active site. Removal of the pro-R hydrogen atom from C9 proceeds 
with ky/kp of ~6-7 (refs 9, 25), indicating that this step is rate- 
limiting. The distance from the pro-R C9 hydrogen atom to the 
water bound to M1 is 3.5 A, possibly corresponding to the direction 
through which the desaturation reaction will initiate. Furthermore, no 
oxygen atom transfer to carbon is anticipated during the desaturation 
reaction”®’’. The enforcement ofa long distance between the acyl chain 
and metals would be consistent with promotion of an electron transfer 
mechanism in which O atoms are retained on an oxidizing metal 
centre as electrons and protons are extracted from the bound and 
sterically configured acyl chain. 

Release of a desaturated acyl chain from the active site merits 
additional consideration. Given the kink and the narrow aperture 
of the substrate tunnel, it seems unlikely that substrate entrance or 
product release can occur by simple linear diffusion in and out of the 
tunnel. However, a break in the hydrogen bond between Gln143 
and Thr257 below the kink in the substrate tunnel would create a 
fenestration into the hydrophobic core of the membrane (Fig. 2c), 
allowing lateral transfer of substrates and products into and out of the 
well-formed substrate tunnel. A separation between TM4 and the 
loop between helices H1 and H2 could break this bond (Extended 
Data Fig. 8). 

In SCD1, the electrons needed for the desaturation reaction are 
obtained from cytochrome bs (cytb5), which in turn obtains electrons 
from NAD(P)H via cytochrome b; reductase’. Although it is known that 
cytb5 consists of an N-terminal haem-binding domain and a C-terminal 
membrane anchor domain’*, and must be membrane-anchored to 


c H2 
PO 
HXXHH 
a 
moe 
“ol ee tS 2. ee 4 
aN) / a z 
Ee o Z 3? ss. Ie 
2T- S Pd Pi 
: 
THX 
e 
H8 


Extended Data Fig. 7 for a stereo view. c, Schematic showing the locations of 
the coordinating His and Asn residues in four conserved motifs on TM2, 
TM4, H2 and H8. 
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Figure 4 | Proposed interactions between cytb5 and SCD1. a, Electrostatic 
surfaces of cytb5 (left) and mouse SCD1 (right) oriented with their proposed 
interaction surfaces towards the viewer. b, A proposed model of the complex 
between cytb5 and SCD1. Cytb5 is coloured green; domains C1 and C2 on 
SCD1 are teal and magenta, respectively. The inset shows a closer view of the 
proposed electron transfer pathway, with the haem domain shown as blue 


function in the desaturation reaction’, the cytb5 binding site on SCD 
is unknown. Examination of the mouse SCD1 structure shows that 
the dimetal centre could be accessible from the cytoplasm via a 
groove formed between the soluble domains C1 and C2. In the crystal 
structure, the N terminus of the protein lies along this groove 
(Extended Data Fig. 9); however, it forms few interactions with the 
cytoplasmic domain and potentially could be displaced by cytb5. 
Electrostatic calculations on the two proteins demonstrate that the 
predominantly positive surface of SCD1 is well complemented by the 
mostly negative surface of cytb5 (Fig. 4a). Placement of the cytb5 
functional domain along this groove would place the electron donor 
groups within 14 A of the dimetal centre, an acceptable distance for 
electron transfer between biological redox centres” (Fig. 4b). His157 
(H2) and His298 (H§8) sit directly above the two metal ions and 
are potential candidates to form an electron transfer interface. This 


sticks and the metal ions as grey spheres. The last residue resolved in the cytb5 
structure is approximately 35 A from the predicted position of the bilayer. 
The 18-residue linker from this residue to the transmembrane helix of cytb5 
should therefore be long enough to allow for the haem-containing domain to 
bind in this orientation. 


proposed complex also places negatively charged residues on cytb5 
demonstrated to be necessary for complex formation”* close to posi- 
tively charged residues on H4 of SCD. The structure of mouse SCD1 
enables us to address questions related to basic mechanisms of the 
desaturation reaction more precisely. 

Online Content Methods, along with any additional Extended Data display items 


and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Desaturase expression and purification. To identify an optimal SCD candidate 
for crystallization, we tested 24 eukaryotic homologues for overexpression in High 
Five (Trichoplusia ni) insect cells using the Bac-to-Bac system (Invitrogen) to 
generate virus. During the process of identifying suitable homologs, we found that 
the InterPro fatty acid desaturase family 1 motif, G-E-X-[FYN]-H-N-[FY]-H-H- 
X-F-P-X-D-Y, is found in more than 7,000 eukaryotic sequences (fatty acid desa- 
turase type 1, InterPro accession IPR005804) from animals, plants and fungi. The 
three His residues from this motif are now known to provide ligands to both metal 
ions, and so provide an excellent indicator for the presence of a mouse SCD1-like 
protein fold. Inclusion of bacterial genes having a related three-His motif expands to 
more than 22,000 sequences with 90 distinct domain architectures where a mouse 
SCD1-like fold is fused to domains with a redox centre such as cytb5, plant-type and 
Rieske-type ferredoxins, rubredoxins and thioredoxins, as well as domains for 
peptidases, hydrolases, kinases, phosphatases, and many others. This diversity of 
domain architectures suggests a considerably greater versatility for the SCD1 fold 
than the originally anticipated desaturase and alkane hydroxylase activities’’. 

The pFastBac vector was modified to include a tobacco etch virus (TEV) 
protease cleavage site before the C-terminal polyhistidine tag. The majority of 
the 24 homologues expressed, but the mouse SCD1 clone (GI:13938635, Life 
Technologies Open Biosystems) gave the highest yield and stability. Since full- 
length mouse SCD1 was prone to aggregation and precipitation, we designed five 
more mouse SCD1 constructs of various N-terminal truncations based on sec- 
ondary structure prediction and sequence conservation. The construct containing 
residues 24-355 was stable and yielded diffracting crystals. 

Cells were infected with baculovirus at a density of ~3 X 10° cellsml~' and 
were grown at 27 °C for 48-56 h before being collected by centrifugation at 2,000g 
for 20 min. Cell membranes were isolated from cell pellets following a published 
protocol*'. In brief, cell pellet from 11 of culture was lysed in 50 ml hypotonic 
10mM HEPES, pH7.5, containing 10mM NaCl, 5mM MgCl, and 25 yg ml! 
DNase I. After centrifugation at 55,000g for 45 min, cell membranes were washed 
with 50 ml high-osmotic buffer containing 25 mM HEPES, pH 7.5 and 1 M NaCl. 
Purified membranes were re-suspended in a low-osmotic buffer containing 
25mM HEPES, pH7.5, 150 mM NaCl and 40% (v/v) glycerol, flash frozen with 
liquid N>, and stored at —80 °C. 

Purified membranes were thawed and dounced in (10 ml per gram membrane) 
20mM HEPES, pH7.5, 150mM NaCl and 2mM £-mercaptoethanol, and solu- 
bilized with 2% (w/v) n-decyl-B-p-maltoside (Anatrace) at 4°C for 2h. After 
centrifugation (55,000g, 45 min, 4 °C), the desaturase was purified from the super- 
natant using a cobalt-based affinity resin (Talon, Clontech) and the His-tag was 
cleaved by TEV protease (leaving an extra ENLYFQ peptide at the C terminus). 
Purified desaturase was collected and concentrated to 5 mg ml‘ (Amicon 50 kDa 
cutoff, Millipore) and loaded onto a size-exclusion column (Superdex 200 10/300 
GL, GE Health Sciences) equilibrated with 25 mM HEPES, pH 7.5, 150 mM NaCl, 
0.18% (w/v) n-decyl-B-D-maltoside and 5 mM {-mercaptoethanol. The peak frac- 
tions containing desaturase were pooled and immediately used for crystallization. 
Lipidic cubic phase crystallization. Crystallization trials with detergent-solubi- 
lized protein and the bicelle method” failed to yield crystals, whereas the in meso 
method” succeeded. For lipidic cubic phase crystallization, the purified desaturase 
was concentrated to 50 mg ml ' and two volumes of protein solution and three 
volumes of molten monoolein (Sigma) were mixed with a coupled syringe device. 
Crystallization trials were performed with 96-well glass sandwich plates 
(Molecular Dimensions) and a Gryphon crystallization robot using 50 nl pro- 
tein-lipid mixture overlaid with 800 nl precipitant solution in each well. The initial 
crystal hits were systematically optimized by screening against salt and PEG con- 
centrations, pH values, and different lipids. The best crystals grew to a final size of 
~50 X 50 X 20 um within 5 days in 100 mM MES buffer, pH 6.7-7.1, containing 
33-37% (v/v) PEG 400, 200 mM NaCl, 4% (v/v) ethylene glycol. Crystals were 
collected directly from the protein-lipid mixture using 50pm MiTeGen 
MicroMounts and immediately flash frozen in liquid nitrogen. 

X-ray data collection and processing. X-ray diffraction data were collected at 
beamline 24ID-C (NE-CAT) at the Advanced Photon Source at Argonne National 
Laboratory. A data set collected from a single crystal at a wavelength of 1.254 A 
with a resolution of 2.8 A was used for phasing (Extended Data Table 1). The 
phasing data were processed with XDS* and scaled with AIMLESS”. The presence 
of zinc as the predominant ion was confirmed via fluorescence emission spectra 
using an Amptek SDD fluorescence detector, and analysis of the anomalous signal. 
A second data set with a resolution of 2.6 A was collected on a second crystal at a 
wavelength of 0.9795 A for use in molecular replacement. The higher resolution 
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data were indexed, integrated, and scaled using HKL2000 (ref. 36). The crystals 
belonged to the space group P2,2;2,; with unit cell dimensions of a = 77.06 A, 
b= 113.77 A and c= 141.70A. 

Structure determination and refinement. SHELXD/SHELXE” found four 
anomalous scatterers per asymmetric unit and the resulting density modified 
map was used to build an all-helical partial model of one protein molecule. The 
phased translation function from MOLREP** was then used to locate a second 
protein molecule. This partial model and the heavy atom sites were input into 
PHASER-EP”, which was run in MR-SAD phasing mode. Further density modi- 
fication was carried out with RESOLVE”. The structure model was then further 
built through successive rounds with COOT* and refinement with phenix.refine’. 
This model was then used as the input model for molecular replacement using 
PHASER with the 2.6 A diffraction data. The model was completed through suc- 
cessive rounds of model building with COOT and refinement with phenix.refine. 
The crystallographic map was easily interpretable. In the final stages of refinement, 
TLS groups were determined using TLSMD* and protein geometry was validated 
with MolProbity**. Figures were produced with PyMOL (Schrédinger LLC.). 
Electrostatic surfaces were generated with Chimera‘. For Fig. 4, coordinates were 
used from rat cytochrome bs (PDB accession 1BFX; ref. 46), which is 100% identical 
to mouse cytochrome b; over the depicted residue range. 

Functional studies of mouse SCDs. For expression using the galactose-inducible 
yeast expression vector pYES-DEST52 (Invitrogen,)””, mouse SCD1 and SCD3 
genes were cloned with an 81 nucleotide leader sequence encoding the N terminus 
of Saccharomyces cerevisiae desaturase (ole1) appended to the 5’ end. Mutations of 
mouse SCD3 were made using QuikChange (Stratagene). Expression plasmids 
were transformed into L8-14C (ref. 48), an olel” yeast strain, and cultured on 
agar plates containing 0.5mM each of oleic acid and palmitoleic acid’’. SCD 
expression was induced on agar plates containing 2% galactose. SCD expression 
was detected by western blotting using a His-Tag monoclonal antibody (Novagen, 
70796-3), goat anti-mouse IgG AP conjugate secondary antibody (Novagen, 
69266-3), and alkaline phosphatase reagent (Novagen). Total fatty acids were 
determined as methyl esters using gas chromatography and mass spectroscopy”. 
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Extended Data Figure 1 | Sequence alignment of mouse SCD1 with other 
integral membrane desaturases. The N terminus of mouse SCD1 is not 
shown. For all the other sequences, only the region aligning to mouse SCD1 is 
included. Secondary structure elements from the mouse SCD1 crystal structure 
are labelled. Residues discussed in the text are highlighted in red (histidines in 
the primary coordination sphere of the dimetal unit), purple (carboxylates in 
the secondary coordination sphere of the dimetal unit), blue (acyl-chain 
binding site), yellow (CoA-binding site), green (residues that may determine 
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YILLMLLTAFVIPTLICGYFFN-DYMGGLIYAGFIVFVIQQ, FCINS YIGTQPFDDRRTPRDNWITAIV 328 


the length of bound acyl chains), black (mutations that change the substrate 
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specificity in mouse SCD3) and grey (Arg249 in the transmembrane region). 
The accession numbers for sequences included in the alignment are: 
mouse SCD1 (GI: 31543675), mouse SCD3 (GI: 13277368), human SCD1 
(GI: 53759151), zebrafish SCD1 (GI: 28394115), D. melanogaster desat2 
(GI: 24646295), C. hyperboreus ChDes9-1 (GI: 589834955), C. elegans FATS 
(GI: 544604099), delta-9 desaturase from Synechocystis sp. PCC 6803 

(GI: 339274799), delta-9 desaturase from A. thaliana (GI: 18402641), and 
yeast OLE] (GI: 1322552). 


LETTER 


Extended Data Figure 2 | Structural role of Arg249. The conserved arginine 
residue Arg249, located on TM4 within the transmembrane region of the 
protein, forms a hydrogen bond with the carbonyl oxygen of Cys222 on TM3. 
This interaction may help stabilize the kink in TM3 caused by Pro226 on the 
following turn. 
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Extended Data Figure 3 | Structure of the SCD1 cytoplasmic domain. Four views of the cytoplasmic domain. The proposed amphipathic helices are coloured 
blue, while the other helices forming the cytoplasmic domain are green. 
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Extended Data Figure 4 | The mouse SCD1 crystal lattice. Cross-sections of _ chains are mediated by residues from a C-terminal cloning artefact. All 

the crystal lattice for the P2,2;2; mouse SCD1 lipidic cubic phase crystals, interactions with chains in neighbouring asymmetric units involve antiparallel 
viewed from two perpendicular directions. One asymmetric unit is coloured _ orientations of the interacting monomers and have small interface areas. 
blue. Within an individual asymmetric unit, interactions between the two 
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Extended Data Figure 5 | Electron density maps for acyl-CoA and the 
dimetal centre. a, Stearoyl-CoA bound to SCD1 is superposed with the 
weighted 2F, — F, electron density contoured at 1.50 (left) or F, — F, electron 
density calculated with the substrate molecule omitted and contoured at 2.30 
(right). b, Stereoview of the dimetal centre and coordinating histidines, shown 


with the weighted 2F,, — F- density contoured at 2a. c-e, The dimetal centre 
superposed with the anomalous difference map, contoured at 5a (c), the 

F, — F, density calculated with the zinc atoms omitted, contoured at 3 (d), 
and the F, — F, density calculated with the ordered water molecule between M1 
and Asn261 omitted, contoured at 3a (e). 
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gel image obtained May 25, 2007, 4:50 pm 
Extended Data Figure 6 | Western blot analysis of SCD expression. antibody. Dotted line in a shows the portion of the complete gel image included 
a, b, Analysis of two separate yeast expression trials after introduction of in Fig. 2f; dotted line in b shows the corresponding expression trials from the 
mutations to mouse SCD3 to impart catalytic specificity of mouse SCD1. second experiment. a, Expression trial 1, with gel artefacts in lanes 2 and 3. 


Contents of lanes are as indicated in the gel. The position of SCD is indicated by __b, Expression trial 2, with gel artefacts in lanes 4, 6 and 7. 


a black star. Additional bands are other proteins detected by the polyclonal 
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Extended Data Figure 7 | Coordination in diiron-containing desaturases. 
a, Stereoview of residues forming both the first and second coordination shell 
around the dimetal centre in mouse SCD 1. b, Stereoview of the coordination of 
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the dimetal centre in the stearoyl-acyl carrier protein desaturase from the castor 
bean (PDB accession 1AFR). 
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Extended Data Figure 8 | The Thr257-GIn143 hydrogen bond blocks these two residues creates the kinked shape of the substrate tunnel, and their 
product egress. The surface of the substrate tunnel housing the acyl chain is separation would result in a larger opening capable of releasing the product 
shown, with the structural elements AH1, H2 and TM4, and the hydrogen- into the bilayer. 

bonded residues Thr257 and Gln143 highlighted in the inset. The proximity of 
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Extended Data Figure 9 | The SCD1 N terminus. Two perpendicular views of cytosolic domain (beige surface). The dashed yellow circles indicate the 
mouse SCD1, from within the plane of the membrane and from the cytoplasmic approximate location of the metal atoms. 
side, showing the interaction between the N terminus (red ribbon) and the 
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Extended Data Table 1 | Crystallographic and structure refinement statistics 


Data collection 
Space group 
Unit Cell (A) 

Wavelength (A) 

Resolution (A) 

Reym 
Vol 
Completeness (%) 
Redundancy 
Refinement 
Resolution (A) 
No. reflections 
Rwork/Riree 
No. atoms 
Protein 
Ligand/ion 
Water 
B-factors (A*) 
Protein 
Ligand/ion 
Water 

R.m.s deviations 

Bond lengths (A) 

Bond angles (°) 

Ramachandran Plot 
Favored (%) 
Allowed (%) 
Outliers (%) 


Phasing Data Set 


P2,2,2, 
a=77.51, b=114.53, c=140.97 
1.2541 
38.8 - 2.8 
0.18 (1.17) 

9.5 (1.3) 
99.1 (94.0) 
7.9 (7.5) 
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Refinement Data Set 


P2,2,2, 
a=77.06, b=113.77, c=141.70 
0.9795 
474-26 
0.11 (0.53) 

14.4 (3.1) 
98.0 (94.0) 
6.2 (5.0) 


474-26 
38016 
20.3 / 23.5 


5231 
188 
86 


46.8 
51.5 
40.1 


0.002 
0.633 


93.8 
6.2 
0.0 
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Nobel prizewinners recall advice 
from their early days go.nature.com/ffqgq7 


I Keep it simple and 
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The rise of blockbuster biological drugs should translate into hiring opportunities for early-career scientists. 
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Drug hunters wanted 


The biotherapeutics industry is burgeoning — and it needs scientists with specialized 
disease knowledge and technical savvy to join in the drug-discovery efforts. 


BY JEFFREY M. PERKEL 


newly approved class of anti-cholesterol 
Amster could be the latest in along 

ine of ‘biopharmaceutical blockbusters. 
These drugs not only produce big revenue for 
pharmaceutical companies, but also represent 
employment opportunities for early-career 
scientists who want to develop cutting-edge 
therapies. To get into the game, aspiring young 
researchers must tailor their training and skills 
to the industry. 


Biopharmaceuticals — or ‘biologics’ — 
are complex drugs that are manufactured or 
extracted generally from biological sources. 
They include proteins produced in engi- 
neered cells, other large molecules and live 
cells. A 2013 report by the Pharmaceutical 
Research and Manufacturers of America, 
a trade association in Washington DC, said that 
907 biologics were in development in the 
United States alone (Medicines in Development: 
Biologics Ph»RMA; 2013). And drug manufac- 
turers worldwide are keen to expand the field. 


In July, the US Food and Drug Administration 
(FDA) approved a cholesterol-lowering bio- 
logic called alirocumab. A similar compound, 
evolocumab, could get full approval later this 
year; it passed the FDAs preliminary regulatory- 
assessment panel in June. The drugs are the first 
candidates in their class — known as PCSK9 
inhibitors, after the protein whose function they 
block — to advance this far in any drug com- 
pany’s pipeline. Alirocumab and evolocumab 
both belong to a group of medications called 
monoclonal-antibody therapeutics, which | 
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> work by binding to and altering target mol- 
ecules; PCSK9 inhibitors specifically seek out 
and inactivate a protein in the liver, leading to 
a decrease in the amount of low-density lipo- 
protein (LDL) cholesterol in the bloodstream. 

Some analysts project multibillion-dollar 
sales for these two cholesterol combatants. 
The 20 highest-grossing biologics in 2013 
generated more than US$1 billion each, 
and the top three collectively raked in 
more than $28 billion (Nature Biotechnol. 
32, 992-1000; 2014). “I think it’s fair to say 
we expect to see strong growth in biothera- 
peutics,” says Raymond Amato, who oversees 
hiring for the research and development divi- 
sion of pharmaceutical company Pfizer in 
Groton, Connecticut. 


BREAK INTO BIOLOGICS 

Early-career researchers who hope to get a 
toe in the door need detailed knowledge of 
relevant areas, such as neurobiology, cardio- 
vascular disease or immunology, and should 
be deeply familiar with specialized techniques 
such as antibody engineering, next-generation 
DNA sequencing, bioinformatics or genetic 
manipulation. It also helps to have proven 
problem-solving abilities, creativity and 
leadership skills — and the capacity to work 
ina team, says Mark Kowala, chief scientific 
officer of cardiometabolic and diabetic com- 
plications at Eli Lilly, a health-care company 
in Indianapolis, Indiana. 

Lilly alone has about 100 job openings in 
therapeutic development, including biologics. 
In the biopharma arena, the company’s research 
and development positions are wide-ranging 
and include lab technicians, senior scientists, 
statisticians, pharmacologists and toxicolo- 
gists, says Jennifer Porath, a human-resources 
director for global recruiting at Lilly. A PhD 
is not necessarily required, she adds. Kowala’s 
department, for example, has two lab heads 
without PhDs, who represent about one-tenth 
of the group. Competition for these jobs is 
fierce. Paul Nioi, a translational biologist at 
Amgen in Cambridge, Massachusetts, says 
that his most recent postings attracted about 
50 applicants each over the first week. At Pfizer, 
advertisements for entry- to mid-level-research 
positions draw 150 to 


250 candidates apiece, “Theywered 
Amato says. li ne ised 
So how can early- “a a 
career researchers lar ied ans 

boost their viability as a d ing abow 

candidates? In addi- the disease 

tion to fundamental- relevance of my 
work. 


biology knowledge 
and lab skills, it helps 
to gain familiarity with concepts important to 
the biopharmaceutical business, such as the 
clinical development and large-scale produc- 
tion of monoclonal antibodies, recombinant 
proteins and cellular therapies. Those who 
hope to work for specific companies should 


SOFT TOUCH 


More than lab savvy 


When it comes to nabbing a job in the 
biopharmaceutical industry, ‘soft’ skills 
matter — and among the most important is 
the ability to communicate. As in all scientific 
disciplines, scientists in biopharmaceutical 
research and development must write 
papers and give presentations. But they 
must also communicate ideas to company 
executives and colleagues and pitch a 
message effectively to both audiences. 
“It’s sometimes like being a used-car 
salesman, as well as being a scientist,’ says 
Andrew Adams, a senior research scientist 
at the pharmaceutical company Eli Lilly in 
Indianapolis, Indiana. “You have to be 
really persuasive to people who sometimes 
will be at odds with the idea that you’re 
putting forward.” 

Diana Ritz, a principal scientist who 
works for GlaxoSmithKline (GSK) in Upper 


seek out knowledge relevant to the firms’ drug- 
development targets. Amato says that Pfizer, for 
example, in collaboration with the gene-editing 
firm Cellectis in Paris, has begun to pursue cell- 
based immunotherapies. Among the technical 
skills required for those jobs are primary T-cell 
culture and experience with cell manufacturing, 
characterization and quality control. 

John Beals, a protein chemist who advises 
on bioproduction at Lilly, hires engineers who 
can clone and express genes, measure protein 
properties and assess molecular structure. 
Pharmaceutical giant GlaxoSmithKline (GSK) 
looks for scientists who are skilled in analytical 
chemistry, cell biology and protein purification, 
says Joseph Tarnowski, a senior vice-president 
at GSK’s office in Upper Merion, Pennsylvania. 

A skill set that was directly translatable to 
biologics development proved invaluable for 
Wei Ni, a research scientist who joined Lilly 
straight from her postdoc at the University 
of California, San Francisco. Her expertise in 
pharmacology and molecular biology helped to 
set her apart from her competitors, says Kowala, 
who hired her for his drug-hunting team. 

Ni also demonstrated several crucial ‘soft’ 
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Merion, Pennsylvania, can attest to the 
importance of communication skills. She 
was hired directly from a PhD programme 
in chemical engineering, and notes that 
the doctorate taught her how to make a 
research project particularly attractive to 
biopharma companies and how to frame it 
for industry eyes. Presenting talks over the 
years also helped her to hone her message, 
she says. “It’s a matter of focusing on things 
like increases in productivity, time and 

cost savings versus mechanistic minutiae 
that might get academics going, but don’t 
impact the bottom line at all,” she says. 

Being able to think outside the box is a 
plus, says Adams. “Can you make mediocre 
ideas into good ideas by applying creative 
thinking?” he asks. Joseph Tarnowski, a 
senior vice-president at GSK, recommends 
that candidates’ applications include an 
example of an unconventional project 
or strategy, to show that an applicant is 
curious and innovative. “That would get our 
attention,” he says. 

Other key attributes include self- 
motivation, enthusiasm, the ability to think on 
one’s feet — and a genial personality. “You're 
going to spend a lot of time with them; you 
have very challenging assignments,” says 
John Beals, a team leader at Lilly. “If you are a 
joy to work with and people like you, then you 
geta ripple effect of better relationships and 
better problem-solving.” J.M.P. 


skills (see ‘More than lab savvy’). Lilly asks 
candidates to give a research seminar during 
their job interviews. Ni took the opportunity 
to show that she could think both broadly and 
creatively, and communicate her ideas well. “My 
first two slides were on epidemiology and clini- 
cal trials that were relevant to my research,’ she 
says. “I think they were a little surprised that 
I was thinking about the disease relevance of 
my work.’ She also demonstrated an ability 
to develop alternate ideas when her initial 
hypothesis was proved wrong, and to find ways 
to fill technical gaps — in this case, quantifying 
biological phenomena from images — in 
her expertise. 

What matters most, says Stan Crooke, chief 
executive officer at Isis Pharmaceuticals in 
Carlsbad, California, is what a scientist does 
when handed full responsibility for a research 
project and concomitant resources. “Decide 
what area of science youre interested in, dem- 
onstrate the highest-quality performance, and 
that,” he says, “is the answer. = 


Jeffrey M. Perkel is a freelance writer in 
Pocatello, Idaho. 


Ua SCIENCE FICTION 


UNINHABITABLE ZONE 


BY IAN STEWART 


he primary gleamed benignly against 
T: background dusted with stars. From 

the inner system it would have blazed 
angrily in the sky, but this planet was suf- 
ficiently distant to be safe. 

“Perfect!” Unit-Peripheral declaimed. 
“T find it hard to believe that such a beautiful, 
balmy world can be lifeless — yet it is! I will 
authorize colonization immediately.” 

“The world is currently lifeless, Plug-in-43 
pointed out. “The sondage team suspects that 
living creatures were once present here. They 
left... traces. Indicative of bipedality”” 

“Garbage! Nothing can locomote on two 
appendages! The arrangement is unstable. 
The marks were made bya meteoroid impact.” 

“Perhaps. The sondage team has formulated 
an alternative hypothesis. It is... strange.” 

“Tt would have to be. What is it?” 

“Extremophiles.” 

Unit-Peripheral paused to recompute and 
got the same result. “Nonsense. Extremo- 
philes occupy extreme habitats.” 

“Extremism is relative, Unit-Peripheral. 
These beings made a brief visit, decided this 
world was not to their liking, and departed” 

“Where to?” 

“The sondage team suspects... absurd, 
I know, but they take their duty to be 
open-minded terribly seriously ... that the 
extremophiles departed for the inner system” 

Unit-Peripheral expressed indicatory 
signals in the direction of the severely over- 
heated local star. “Into that?” 

Plug-in-43 said nothing, but q-mailed an 
image: two parallel linear series of depres- 
sions, melted into the rock. 

“The inference of warmlife is highly 
improbable,’ Unit-Peripheral stated. “This 
system's gas and ice giants are far too close to 
the star to be habitable. The cryonic zone starts 
with the ice dwarves, of which this is the inner- 
most. Where does the sondage team imagine 
these extremophiles reside? On a comet?” 

Plug-in-43 q-mailed another image. 
“Barrenworld-3. It resembles Fumarole in 
our own system.” 

“Where rocks run molten, forming 
magma oceans.” 

“As they do on Barrenworld-3. The sond- 
age team suspects this system has evolved 
extremophiles that can inhabit such worlds. 
In fact, they suggest these organisms might 
largely consist of molten rock.” Noticing Unit- 
Peripheral’s thunderous mode-expression, 
Plug-in-43 hastened to expand. “Heldin...a 


Approach with extreme caution. 


hugh multitude of... tiny cocoons... able to 
stand the searing tempera —” 

“Clusterglitch! Even extremophiles can’t 
be that extreme! The radiation alone —” 

“Barrenworld-3 has a protective magnetic 
field” 

Unit-Peripheral came to a reluctant 
decision. “Protocol requires me 
to act as if] take these insane 
conjectures seriously. Dis- 
patch six-by-six hardened 
probes” 

The crew descended 
to the planet's orange 
and brown surface for 
a refreshing sunbath, 
and waited. Eventually 
one heavily damaged 
probe struggled back. Its 
report was disturbing. Not 
only a gaseous atmosphere, 
but — 

“Oxygen?” 

“Six-plus-one six-by-sixths by volume.’ 

“And the cyanic areas truly are liquid rock?” 

“Molten dihydrogen monoxide, Unit- 
Peripheral” 

“Despite vast amounts of atmospheric 
dioxygen carbide, indicative of past carbida- 
tion events ona huge scale, there remains free 
oxygen? Why did such a corrosive substance 
not combine with other elements long ago?” 

“It did. It still does. But the poison is 
regenerated.” 

Unit-Peripheral emitted a warning flash. 
“This smacks of theory-saving! How?” 

“By extremophiles.” 

The flash shorted out, destroying a food- 
booth. “Whenever something impossible is 
proposed, it is justified by invoking extre- 
mophiles! ’'ve had enough of this nonsense. I 
cannot accept that intelligent space-traversing 
creatures can regenerate poison gas.” 

“Apologies. I was unclear. The poison is 
regenerated by a different extremophile.” 

Unit-Peripheral vibrated in a mix of anger 
and terror. “There’s more than one of them?” 

“Millions of formats, all multiple upgrades.” 

“Even with a magnetic field, these creatures 
must live underground.” 

“No, they lie around the edge of the 
magmaand... sunbathe. Like us.” 

“At least they have the sense to stay out of 
the liquid rock” 

“No, they periodically immerse themselves 
in it. To — uh — I know no other way to say 
this, Unit-Peripheral. To keep cool.” 

“Cool? The probe is suffering from an 
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overactive imagination. Oxygen is one of the 
most corrosive gases known to cryokind! How 
do these creatures avoid reacting with it?” 

“They don’t. They use it as an energy 
source.” 

“But surely the organisms themselves 
would oxidize!” 
Plug-in-43 abased its bias cur- 
rents as a submission gesture. 
This one would not be well 
received. “Sometimes 
they do. The probe saw 
localized conflagrations 
in which thousands of 
extremophiles died” 
“The bipedal ones?” 
“No. Sessile colonies. 
The team names them 
‘fire-forests. They cannot 
reproduce unless there is a 
conflagration.” 
“They have babies by setting 
themselves on fire?” 

“Only some of the sessile formats. Other 
formats use different methods — all bizarre” 

“They don't flake off circuit copies like 
we do?” 

“No, they... never mind, you wouldn't 
believe it if I told you.” 

“Do the bipeds set themselves on fire?” 

“No, not deliberately. However, they do 
derive their energy from millions of diminu- 
tive internal fires —” 

“Enough! Another word and you will be 
demoted to long-term storage! Iam stacked 
up to the back slots with this arrant non- 
sense! Put out a general order.” 

“We are depositing a colony as planned? 
The conditions here are ideal for our super- 
conductive brains, and —” 

“No! We are departing forthwith. We 
will report that this system has no habitable 
worlds. Which is true. For if any of us were 
to live here for more than a few cycles, the 
mere thought of those things cavorting in 
magma would drive us mad!” 

Unit-Peripheral’s sensorium swivelled back 
to the q-mailed image: a multicoloured globe 
with ugly patterns of toxic cyan, sickly green 
patches, deathly brown scars, all overlain with 
ghastly albescent tendrils. It paused to purge 
its processors, and uttered its final words on 
the topic, dripping with scorn. “Why couldn't 
they have been normal, like us?” = 


Ian Stewart, emeritus professor at the 
University of Warwick, writes popular 
science books and science fiction. 
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s long ago as 1803, Napoleon 
Bonaparte apparently pointed 


to China on a map and warned: 


“« 


Here lies a sleeping giant. Let him 
sleep, for when he wakes up, he will 
shock the world” 

These days, as the world has seen, 
the giant is well and truly awake, not 
least when it comes to science. China 
has almost doubled the proportion 
of its GDP spent on R&D over the 
last decade, and in 2013 overtook 
the combined R&D spending of the 
28 nations of the European Union. 
The Organisation for Economic 
Co-operation and Development 
predicts it will leapfrog the US by 2020. 

For anyone considering their career 
in science, China's rise is reason 
enough to think about a move to Asia, 
but it is not the only one. A little to 
the east, South Korea is second only 
to Israel in the proportion of its GDP 
it spends on R&D, and Japan is not 
far behind. Singapore has built up its 
research and innovation capacities 
rapidly since the turn of the century 
by luring foreign talent with offers 
of large salaries. Australia and New 
Zealand are playing to their strengths 
by focusing limited resources on the 
fields in which they excel. 
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Researchers in the other parts of 
the world that have traditionally 
dominated the scientific playing field 
are facing tough times as competition 
for jobs and grants increases, and 
R&D budgets shrink or flatline. The 
UK, for example, spent 1.6% of GDP 
on R&D in 2013 - less than it dida 
decade earlier. US share of global R&D 
spending fell from 34% to 30% in the 
decade to 2011. 

Hardly surprising then that many 
ambitious scientists are embracing 
opportunities in the Asia-Pacific 
region. Of course, taking a job far from 
home is a big leap, so in this Naturejobs 
Career Guide we present first-hand 
accounts of what it’s like to work in this 
region along with tips from employers 
and key facts and figures. 

Most people who take up overseas 
opportunities have positive, career 
and life-changing experiences. For a 
minority it doesn't work out because 
of unexpected challenges. Those of us 
who have put this guide together hope 
it increases the likelihood that readers 
considering taking the plunge will be 
glad they did. 
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RISING IN THE EAST 


When weighing up a country as a potential place to live and work, 
itis worth considering how it compares with the rest of the region for science 
output, government funding and general cost and quality of life. 


HOW There are both giants and minnows in the Asia-Pacific region when it comes to the number 
of academic papers published in Nature Index, a database that tracks the national 
§ Cl E N CE | affiliations of articles published in high-quality journals. This is clear in terms of both raw 


article count (AC) and adjusted weighted fractional count (WFC). China’s emphasis on 
chemistry and New Zealand’s focus on earth and environmental sciences are clear. 
See below left for a detailed description of the Nature Index. 


IS SPREAD 


WFC: 6,037 
AC: 8,641 


China ranks second only to the 
United States on overall AC. 
Among the six highlighted 
Asia-Pacific countries, it has the 
highest proportion of chemistry 
articles at 60% of WFC, and the 
lowest proportion of life sciences 
articles at 12% of WFC. 


Bubbles are sized according to 
WFC. The pie charts represent 
the countries’ output in the 

categories below. 


e Life sciences 
@ Earth & environmental sciences 
@ Chemistry 

@ Physics 


*Each slice represents the proportion that each 
subject area contributes to an institution's 

overall WFC. Subject areas can overlap, so the 
total percentage may exceed 100%. 


At just under 5.5 million, 
Singapore’s population 
is the smallest among 
the six countries. If WFC 
were calculated per 
capita, Singapore would 
come out on top by far. 


WFC: 521 
AC: 873 


WFC: 1,168 
AC: 1,969 


WFC: 3,200 
AC: 4,976 


Japan has slightly less than 
10% of the population of 

China, but produces more 

than half its neighbour’s. 


SINGAPORE 


The physical sciences contribute 46% 
of South Korea’s WFC — the highest 
proportion in the six countries. 


Australia leads the 
way in life sciences in 
the region, with 35% 
of its WFC attributed 
to the field. 


WFC: 951 
AC: 2,497 


WFC: 96 New Zealand has 


by far the lowest 
AC: 275 article output in the 
region, but is unique 
in having more 
than one-third — 
37% — of its 
NEW WFC in Earth and 
environmental 


‘ 1 EA LA N D sciences. 


NATURE INDEX 


The Nature Index database tracks the affiliations of high-quality scientific articles, and charts 
publication productivity for institutions and countries. Article count (AC) includes the total 
number of affiliated articles. Weighted fractional count (WFC) accounts for the relative 
contribution of each author to an article and applies a weighting to correct imbalances in the 
Index’s subject coverage. This Career Guide draws on Nature Index data derived from articles 
published in calendar year 2014. WFC is used throughout as the primary metric, because it 
provides an even basis for comparison. For more information, visit natureindex.com/faq 


S2 | NATUREJOBS CAREER GUIDE | ASIA-PACIFIC 2015 
© 2015 Macmillan Publishers Limited. All rights reserved 


SOURCES: Funding over time: OECD; Spending per researcher: UNESCO; Research quality: Scopus/Nature Index; 


Researcher density: UNESCO; Big Mac Index: The Economist; Happy Planet Index: Happy Planet Index/Gallup World Poll/UNDP/WWF 


KEY 


FUNDING OVER TIME 


R&D expenditure as % of GDP 


2000 2002 2004 2006 2008 2010 2012 


Research and development (R&D) spending as a proportion of gross domestic 
product (GDP) has risen in all six countries since 2000. The increases in have 
been largest in China and South Korea, both of which have almost doubled their 
investment in R&D in the past decade. 


RESEARCH QUALITY 


0) 1 2 3 4 5 6 7 8 9 10 


Articles catalogued by the Nature Index (%) 


The ratio of each country’s article output in the Nature Index to its 
biological-and physical-sciences output in the Scopus database is an 
indicator of how much of its science is published in the journals most 
favoured by researchers. 


BIG MAC INDEX 


Countrie: ‘line of 

best fit’ ha’ atively high 

c of living and those 
below it relatively lower costs. fia 


Price of a Big Mac (US$) 


0) 10 20 30 40 50 60 70 80 90 100 
GDP per person (thousands US$) 


The Big Mac Index was created by The Economist as a light-hearted illustration of 
the idea that over the long-term, exchange rates should move so that the prices of 
identical goods equalize in any two countries. Taking into account varying labour 
costs, the dotted black line on the graph is the ‘line of best fit’ for Big Mac prices 
plotted against GDP per person for 48 countries (plus the euro area). Assuming 
that Big Mac prices are at least loosely representative of the prices of other goods, 
countries with average costs of living would lie on the line. 


SINGAPORE 
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SOUTH KOREA NEW ZEALAND 


——— $PENDING PER RESEARCHER 


R&D funding per researcher (thousands US$) 


Japan spends the most per researcher, and New Zealand spends by far the 
least. Figures are the most recent available for each country, and are 
normalized for purchasing power. 


RESEARCHER DENSITY 


United States and United Kingdom score 79 


0 10 20 30 40 50 60 70 80 90 100 110 120 


Full-time researchers per 10,000 work force 


South Korea is the region’s leader in the proportion of its workforce doing 
research, and Singapore is not far behind. With 1.4 million researchers in 
a population of 1.37 billion, it will be some time before China catches up. 
Figures are the most recently available for each country. 


HAPPY PLANET INDEX 


China has low self-reported Australia comes top of 
well-being but scores highly th for well- 

overall because of a low per but also has th 

capita carbon footprint. carbon footprint. 


New Z nd ranks top in the 
Asia-Pacific region thanks to high 
self-reported well-being and a 
relatively low carbon footprint. 


Costa Rica 


Botswana 


South Korea has middling 
scores for all three of the 
Happy Planet Ind 
component me: 


Singapore has a high 
carbon footprint and 
middling self-reported 
well-being. 


Produced by UK think tank the New Economics Foundation, the Happy Planet Index 
(HPI) assesses how well countries provide inhabitants with long, happy and 
sustainable lives. It ranks countries using a formula that takes into account 
self-reported well-being, life expectancy and the WWF Ecological Footprint measure 
of the per capita amount of land of average productive biocapacity required to 
sustain national resource consumption. Numbers shown on the axis are the 
countries’ 2012 HPI rankings out of the 151 nations included. 
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CHINA 


With a declared aim to become a global science leader by 2050, the country’s unprecedented 
research spending splurge is creating a wealth of study and work opportunities. 


Shanghai, with its mix of historic and modern architecture, hosts the regional headquarters of many multinational companies. 


ithin six months of moving to 
Shanghai, physics postdoc Kevin 
Huang was heading back to the 


United States. Far from returning with his tail 
between his legs, Huang had found several 
funding sources for a new project during his 
first month at Fudan University. He was making 
the trip to put together the ambitious collabora- 
tion with the Los Alamos National Laboratory, 
New Mexico, and the National High Magnetic 
Field Laboratory in Tallahassee, Florida. 
“Fudan University is building new chemistry, 
physics, and environmental engineering build- 
ings in stark contrast to the institutions in the US 
where funding seems to be slashed every year 
or stagnant at best,’ says Huang, whose parents 
immigrated from China to the US in 1983. 
China is set to become the world’s largest R&D 
investor by the end of the decade, opening job 
opportunities and driving up standards, says Lei 
Jiang from the Institute of Chemistry, Chinese 
Academy of Sciences. He sits ona selection com- 
mittee for the 1000 Youth Talent programme — 
a scheme that offers good salaries, funding and 
welfare benefits to high-flying Chinese scientists 
aged under 40 who take positions in the country. 
Jiang says competition for places has become 
intense since the scheme launched in 2011 — 
part of the broader 1000 Talents drive — which 
began three years earlier to attract some 2,000 
leading scientists, entrepreneurs and finance 
experts of any nationality within 10 years. 
“This year about 700 candidates were 


selected from more than 2,300 applications. 
The research quality was high, so it was difficult 
to eliminate anyone. More than 70% of them 
could get academic positions in the US. This is 
completely different from 15 years ago.” 
China's stated aim is to become a global 
leader in innovation by 2050. It has almost 
3,000 higher education institutions, and the 
number of university applicants has jumped 
from 5.3 million in 2002 to 9.5 million in 2015. 


‘IMAGINE SILICON VALLEY IN THE 
EARLY DAYS — THIS IS WHAT IS 
HAPPENING AROUND ME.’ 


The government has committed large sums to 
high-profile projects such as thorium-based 
nuclear power plants, as well as basic research 
spending, which has historically received less 
funding than in other developed countries. 

Researchers seeking funding can apply to 
national agencies such as the National Natural 
Science Foundation of China, Ministry of Sci- 
ence and Technology, and other bodies, once a 
year. Smaller sums can be applied for directly 
through their websites, while larger grants 
require a presentation in front of committee 
members. There are also programmes run by 
local governments. 

For scientists looking to do research in the 
commercial sector, China is enjoying strong 
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private sector growth. Firms range from well- 
known multinational pharmaceutical brands 
such as GlaxoSmithKline, and home-grown 
equivalents like Pharmaron, to innovative Chi- 
nese start-ups such as fast-growing biomedical 
firm WuXi AppTec. Management consultants 
McKinsey predict that by 2016 more than four 
out of five global life sciences organisations will 
be conducting R&D in China. 

Chuyao Peng received a PhD offer from 
Oxford University to research condensed matter 
physics, but chose to join Chinese tech start-up 
Shanghai Superconductor Technologies. “’'m 
really excited about the opportunities for 
young scientists here. Imagine Silicon Valley 
in the early days — this is what is happening 
around me.” 

But despite China's fast growth and mod- 
ernisation, the country’s tightly controlled 
systems can still be challenging for foreigners 
or returnees. 

Faxian Xiu of Fudan University’s Depart- 
ment of Physics points out that the bureaucracy 
at institutions is irritating: “If you need new 
equipment, for example, the extensive paper- 
work needs to be done well ahead of time. It can 
take up to halfa year to arrive.” 

Internet restrictions in the workplace are also 
amajor headache. Even opening emails can take 
along time and the blocking of many websites 
is frustrating. Other challenges include chronic 
air pollution, poor water quality and high pop- 
ulation densities in urban centres. m 


Nature Index; Salary ranges: Interviews & job ads; Reported well-being: Gallup World Poll 


SOURCES: Funding over time: OECD; Spending per researcher: UNESCO; Research quality: Scopus/ 
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WHERE TO WORK 


Peking University 


3% Pharmaron | 


®HBIS, Handan 


Nanjing 
University 


The below charts represent the research output included in the 2014 Nature Index for ten of China's leading institutions, 
and the contributions of different subjects, measured by weighted fractional count (WFC). 


KEY 


& Life sciences 


Tsinghua University Oo etn e 
; environmental 
3% WuXi AppTec te Fudan sciences 
* Pfizer University @ chemistry 
Institute of Physics, BASF : 
i BEIJING @ Physics 
Phinses Aeacemy cr Shanghai Institute 
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Huazhong University of i Shanghai Jiao Soles ial connaues 
. { ‘ O an Institution's overa’ 
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overlap, so the total 
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Salaries in China are the lowest of the six profiled Asia-Pacific countries, however The Chinese report the lowest levels of life satisfaction 
the cost of living is also relatively low. Ranges are on the basis of reported data. among residents of the six countries. 


FUNDING OVER TIME 


R&D expenditure as % of GDP 


0 
2000 


2002 2004 2006 2008 2010 2012 


China has been the most consistent in its annual increases in the 
proportion of GDP spent on R&D since over the last decade. 


— SPENDING PER RESEARCHER — 


io} 50 100 150 
R&D funding per researcher (thousands US$)* 


200 250 


Only New Zealand spends less per 
researcher of the six nations, however 
China is catching up rapidly. 


*Figures are normalized for purchasing power, 
and are the latest available for each country. 


——— RESEARCH QUALITY ——— 


0) 1 2 3 


5 6 7 8 9 


Articles catalogued by the Nature Index (%) 


The proportion of China’s research output 
published in journals considered by the 
Nature Index is the lowest of the six. 
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BICHENG YANG 

>EMPLOYER 

>DIRECTOR OF COMMUNICATION 
AND PUBLIC ENGAGEMENT 
>BEIJING GENOMICS INSTITUTE (BGI) 


What do you look for in candidates 

for research positions? 

We value innovation most. Science 

is never set in stone but needs new 
thoughts and ideas. Moreover, in 

an era of big data there are lots of 
breakthroughs to be made. We also 
highly value those who have an open 
mind when it comes to knowledge and 
technologies. 


Are your pay rates equivalent to 

those in the West? 

Salaries in China are becoming more 
competitive. Companies have begun 
offering enhanced packages for 
promising positions with good career 
prospects to attract academic talents 
from across the world. 


How could a move to China benefit a 
researcher’s career right now? 
China’s profile and stature is rising 

in many fields of research. Scientists 
benefit from abundant resources and 


incentives provided by national and local 


government to encourage research and 
development. For example, in 2011 in 
Shenzhen, where BGI’s headquarters is 


located, the Shenzhen Municipal People’s 
Government set out its twelfth ‘five-year 
plar to support research and innovation 


within six strategic emerging industries 
- biotechnology, internet, renewable 
energy, advanced materials, cultural 
creativity and information technology. 
Therefore, with China's encouraging 
policies, researchers will have more 


resources and funding, a broad platform, 


and more professional opportunities. 


Is BGI planning high levels of recruitment 


in the immediate future? 


The future of bioscience — and BGI - will 
require multidisciplinary research talents 


from the fields of physics, computer 
science, mathematics, biology and 
beyond. As talent is always the source of 
innovation, we have already recruited 
many scientists trained abroad, and 
intend to recruit more. m 


he rise and rise of Chinese R&D spend- 

| ing 2007-08 global economic slowdown 

saw banks go to the wall, property bub- 

bles pop, consumers tighten their belts and 

tax revenues diminish. It was not long before 

research in both the public and private sectors 

was also cut back as science budgets shrank in 
most countries — with one notable exception. 

As shockwaves were felt across the globe in 
2009, China’s research sector saw a 28% increase 
in R&D funding on the previous year to reach 
1.7% of GDP. With domestic growth still strong, 
Chinese leaders were undeterred from using the 
country’s financial muscle to quickly turn it from 
a scientific backwater to a world leader. 

An increase in annual R&D growth of an 
average of 23% a year over the past decade has 
already seen China overtake the European 
Union in the spending race. In its Science Tech- 
nology and Industry Outlook 2014 report, the 
Organisation for Economic Co-operation and 
Development predicts it will take the top invest- 
ment position from the US by 2020. 

The results of the splurge are evident. China 
has become the second most prolific producer 
of peer-reviewed research articles according to 
the Royal Society, and has taken a global lead in 
the number of patent applications filed each year, 
data from Thomson Reuters show. 

It has also launched a series of major R&D- 
based projects such as a space station and the 
China brain project, dedicated to research into 
artificial intelligence and neurological diseases. 

“In some fields [China] is at the frontier 
of technological knowledge, and the growth 
of published research is extraordinary, 
writes Geoff Mulgan, chief executive of the UK’s 
National Endowment for Science, Technology 
and the Arts (Nesta) in the organisation's 2013 
report China’ Absorptive State. 

Historically, scientists’ salaries and bonuses 
have been linked to the number of articles they 
publish in high-impact journals. Materials sci- 
entist Anthony Cheetham, vice-president of 
the Royal Society, noted in a recent interview 
for Nature that this professional emphasis can 
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CHINA RISING 


THE RISEAND RISE OF CHINESE R&D SPENDING 


2008 2010 


occasionally ‘drive some bad behaviour’ such as 
the falsification of data and plagiarism. 

Steffen Duhm, a German physics professor 
who joined Soochow University in Suzhou, 
Jiangsu, three years ago, agrees that the publi- 
cation culture can stifle exploration. “My stu- 
dents expect from me that I know (and will tell 
them) the results of an experiment before they 
perform it?” 

Likewise, the government set high targets for 
invention patents in its 15-year plan to achieve 
3.3 patents per 10,000 people by 2015 — a figure 
it looks set to surpass. 

However the quality of some of these patents 
has been called into question. The Economist 
noted that just 5% of the patents filed in China 
in 2013 were registered globally, rather than just 
locally — an indication of their perceived low 
value — suggesting that inventors are mainly 
striving to meet bureaucratic targets. Others say 
that despite this, producing commercially viable 
innovations is a priority for authorities. 


CHINA PUBLISHED A 15-YEAR 
PLAN T0 TRANSFORM 
THE COUNTRY INTOA 

SCIENCE AND TECHNOLOGY 

POWERHOUSE BY 2050 


The country’s scientific achievements are inev- 
itably part of its long-term strategic goals. Itaims 
to raise its R&D spend to 2.5% of GDP by 2020 
—up from 1% in 2000. Meanwhile the US share of 
global R&D spending has fallen from 34% to 30% 
2001-2011. In 2006, the Chinese government 
published a 15-year plan which included plans 
to transform the nation into a global science and 
technology leader by 2050. Assuming it is able 
to maintain the extraordinary rates of growth in 
R&D spending seen in recent years, few would 
bet against it achieving this goal. 

None of its economic rivals can match these 
levels of growth. m 


em China 

=== Japan 

—— Australia 
Singapore 
South Korea 
New Zealand 


United States 
European Union 


China now spends more on 
R&D than the 28 countries 
mse of the EU combined and is 
2012 sie predicted to overtake the 
US in 2020 


SOURCE: OECD 


ENTRY REQUIREMENTS 
The employers of foreign nationals must 
apply for work permits on their behalf. As 
in most countries a case must be made that 
there are no nationals capable of fulfilling the 
requirements of the job. Potential employees’ 
CVs, university transcripts and certificates are 
presented to local authorities. 

Once permission to apply for a work permit 
has been granted, individuals must apply 
for an employment Z visa for a single entry 
into China. After arriving in the country, the 
employer and their employee have one month 
to apply for an official residence visa and work 
permit. This can take up to three months to 
process and is valid for one year after which it 
must be renewed annually. 


ACADEMIC YEAR 

Autumn term runs from September to late 
January or early February, ending by the 
Spring Festival or Chinese New Year. 
Spring term runs from late February or 
early March through to late June or early 
July. Summer break is from July to the 
end of August. 


OPPORTUNITIES & CONTACTS 

1,000 Talent Plan of Foreign Experts, also 
known as the National Recruitment Program 
of Global Experts, and the 1,000 Youth Talent 
Plan: http://onestop.globaltimes.cn/what- 
is-the-1000-talent-plan-for-people-with- 
chinese-heritage-and-how-does-it-work/ 
100 Talents Program for overseas Chinese 
applicants: search@ucas.ac.cn 

naturejobs: http://www.nature.com/ 
naturejobs/science/jobs?q=china 

Science careers page: http://jobs. 
sciencecareers.org/jobs/life-sciences/china/ 


In 2014, China’s Three Gorges 
dam broke the world record for 
annual hydroelectric power 
production generating 98.8 billion 
kwh of electricity. 


al Y / 


What encouraged you to move to China? 
Before I took the position at Soochow 
University, I visited for two months and 

fell in love with Suzhou and its 2,500-year 
history. The old quarters of the city remind 
me of my hometown, Schwabisch Hall, 
although the population of Suzhou is a 
hundred times larger and the history is four 
times longer. Before I came to China I lived 
in Japan for four years, but it is very difficult 
for foreigners to get positions above the 
postdoc level there, and in Germany there 
are almost no tenure-track positions. 


How does your daily life contrast with 
living in the West? 

My life is very Western in style. My 
apartment is to Western standards, there 
are some Western supermarkets, many 
pubs, and many, many German engineers 
here working at Bosch. If I wanted, I 
could live an almost entirely German life 
here, except for the air pollution and the 
Chinese internet restrictions. 


How easy have you found it to make 
friends? 

It has been somewhat difficult to make 
Chinese friends. The pubs in Suzhou are 
mainly frequented by expats. At work, 
most of my colleagues are very friendly, 
however, I get the impression most of 
them are not interested in becoming close 
friends with a foreigner — unless they've 
spent some time in English-speaking 
countries and are more familiar with 
Western customs. 


How did your family react to your move? 
My mother and my sister were very shocked 
when I told them. Although they were used 
to the fact that I was far away, they did not 
like the idea that I would live in China. But 
my mother has since visited and feels much 
more comfortable about it now. 


Do you need to understand Mandarin to 
live in China? 

Ihave not learned Mandarin. I can 
manage daily life like shopping, taking 
the bus or going to restaurants without 
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STEFFEN DUHM 

>FOREIGN EMPLOYEE 
>PROFESSOR SINCE 2012 
>INSTITUTE OF FUNCTIONAL NANO 
AND SOFT MATERIALS (FUNSOM) 
SOOCHOW UNIVERSITY 


the language, but it requires some 
more planning. For example, I avoid 
restaurants without picture menus or 
taking a taxi spontaneously. In Japan, I 
started dating a Chinese woman, who 
has since moved here, and she helps 
me with practical things like finding an 
apartment or buying furniture. 


Does the language barrier affect your 
working life? 

At work Chinese - in principle - is not 
required. All faculty members of our 
college studied and worked abroad 

for some time and can speak English 
very well. All the teaching is supposed 
to be done in English. However, as all 
the students are Chinese, some of the 
professors switch to Chinese in their 
lectures. The communication with some 
of the students in my research group 

isa little difficult and they are de facto 
supervised by the Chinese associate 
professor in our group. The biggest 
problems are the written regulations 
and announcements. I get great help, 
but it is impossible for staff to translate 
all the documents for me, so they have 
to decide what is important. Sometimes 
their guesses are wrong, or reflect their 
opinion. Another problem is that most of 
the grant proposals have to be written in 
Chinese and consequently my proposals 
have to be translated, which does not 
enhance their quality. 


What misconceptions do Westerners have 
about working as a scientist in China? 
There are still many prejudices against 
Chinese science. I tried to convince some 
German friends to join FUNSOM ata 
faculty level, but they do not want to come 
to China because they think it might not 
help their future careers in Germany. 

For example, my PhD supervisor from 
Humboldt University was very, very 
skeptical and only came here the first time 
because we are friends. Now he is a chair 
professor of FUNSOM, visits here twice per 
year, and three students of FUNSOM will 
soon join his lab for their PhD. = 
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DONGLAI FENG, 

>EMPLOYER 

>DIRECTOR OF THE STATE KEY 
LABORATORY OF SURFACE PHYSICS, 
FUDAN UNIVERSITY 


Why should foreign researchers consider 
moving to China? 

China is the fastest-growing place in terms 
of research funding, and the academic 
culture is transforming into a more 
healthy and creative one. At this time, 
there are great opportunities for scientists 
to practice their leadership skills and 
become pioneers. In the past ten years, 

all the basic infrastructure has been set 

up so labs are now as modern as those in 
many developed countries, with the added 
advantage that researchers can easily afford 
administrative support. 


What cultural differences should those 
coming in from abroad be aware of? 
The culture of science is still in the 
developmental stage here. Hierarchy is 
an important part of Chinese culture, so 
foreigners quickly learn to be tactful. At 
lunch, professors may talk more about 
politics than science. During meetings, 
academics often prefer not to make critical 
comments of their peers because they are 
afraid they might embarrass someone 
who could harm their career in the 
future. However, things are changing and 
becoming much more meritocratic. 


‘THE ANTI-CORRUPTION 
CRACKDOWN HAS MADE 
PEOPLE MUCH MORE 
CAUTIOUS.’ 


Are there are myths about working as a 
scientist in China that aren’t true? 
Corruption has been a problem in the past, 
but the recent anti-corruption crackdown 
by the Chinese government has made 
people much more cautious. There also 
used to be an instant financial bonus for 
getting published in major journals such 
as Nature. Now the emphasis is on the 
research and the ‘paper bonus’ culture is 
largely gone. m 


Fudan Daxue, Campus, Shanghai, China. 


ood postdocs are the engines that drive 
Gn research and while Western uni- 

versities are unable to offer places to all 
those who want postdoc work, in China there 
are plenty of positions but a lack of high quality 
candidates to fill them. 

Poor pay and prospects have historically 
driven the country’s cleverest PhD graduates 
overseas to gain experience in foreign labs. 
Without experience abroad, ambitious Chinese 
researchers stand little chance of obtaining ten- 
ure at home, and are paid poorly. Many potential 
postdoc candidates from abroad are put off by an 
outdated impression of Chinese science among 
ambitious graduates. 

“Although science in China is growing tre- 
mendously in quality and quantity, the image 
of Chinese science makes it difficult for me to 
recruit foreign researchers, especially postdocs,” 
explains physicist Steffen Duhm of Soochow 
University. “As my research is very fundamental 
and specialised, this is important, as it is difficult 
to find experts within China.’ 

Many foreigners are concerned that taking 
a postdoc position in China could be detri- 
mental to their careers. Back in 2011, China's 
minister of human resources and social secu- 
rity Yin Weimin pointed out that foreign post- 
docs account for 65% of the US postdoctoral 
workforce and 40% in the EU, but only 1% of 
China's postdocs. 

“There has not been good track record for 
people coming out of China as postdocs to land 
good academic jobs,” points out Ying Liu, a 
physicist who holds positions at both PennState 
University and Shanghai Jiao Tong University. 

No-one is suggesting that the problem can be 
solved entirely with financial rewards, but they 
might bea good place to start, says Donglai Feng, 
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POSTDOCS IN CHINA 


director of the State Key Laboratory of Surface 
Physics at Fudan University. His highest paid 
postdocs receive around $34,000 [210,000RMB] 
per year with additional housing subsidies. He 
currently has 30 postdocs in his laboratory, 25% 
of whom are from foreign institutes. 


‘THE GENERAL RESEARCH 
LEVEL HERE IS BECOMING AS 
HIGH AS IN OTHER COUNTRIES.’ 


“The situation is changing dramatically 
because the general research level here is becom- 
ing as high as in other countries,’ says Feng. “My 
vision in the future is to recruit the best post- 
docs either from China or from other advanced 
countries. If there are not enough permanent 
positions for young people in these countries, 


China is a good option.” = 


Between 2011 and 2013, China 
consumed more concrete than the 
US in the entire 20th century. 


JTB MEDIA CREATION, INC./ALAMY 


TON KOENE/ALAMY 
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JAPAN 


Being a team player is paramount in Japanese culture, and despite a shiftin funding structures 
to short-term contracts, foreign researchers are attracted by cutting-edge science. 


Socialising with colleagues is a major part of teamwork required in the Japanese working culture. 


hen Louis Irving celebrated his mar- 
riage with a traditional Shinto wed- 
ding in the Japanese coastal town of 


Sendai in April, the first person to toast the cou- 
ple wasnt an old friend or relative from home. It 
was Shinobu Satoh, his boss of five years. 

“In Japan, your boss is considered one of 
your main sponsors in life, helping you to gain 
promotions and move up the food chain,’ says 
Irving, a biologist at the University of Tsukuba, 
north of Tokyo. “It was nice for my wife’ family 
to hear that I am doing a good job at work and 
have a secure future.” 

Irving first arrived in Japan in 2003 asa PhD 
student from the University of Aberdeen with a 
keen interest in the enzyme RuBisCo, responsi- 
ble for converting atmospheric carbon dioxide 
into fuel for plants. 

“The best place for measuring the synthesis 
and degradation of RuBisCo was Tohoku Uni- 
versity in Sendai,’ says Irving, who returned to 
Japan in 2007 and soon took up a postdoctoral 
fellowship through the Japanese Society for the 
Promotion of Science (JSPS). 

Today he works as an assistant professor at 
Tsukuba, a position created under the govern- 
ment’s ambitious Global 30 project, which aimed 


to bring 300,000 international students to Japan. 
Last year, the government launched the Top 
Global Universities programme to encourage 
universities to become more international and 
to raise their global rankings through student 
and faculty exchanges, joint degrees and research 
collaborations, employment of more foreign 
staffand provision of more lectures in English. 


‘YOUR BOSS IS 
CONSIDERED ONE OF YOUR 
MAIN SPONSORS IN LIFE.’ 


The University of Tsukuba, for example, plans 
to increase its proportion of foreign researchers 
from the current 5% to 18% by 2023, but some 
believe that such aspirations are being under- 
mined by a shift in strategic funding priorities. 

“The 1980s was Japan-bashing time,” says 
Atsushi Sunami, a specialist in science, technol- 
ogy and innovation policy at Tokyo’ National 
Graduate Institute for Policy Studies. “The world 
criticized Japan for free-riding on fundamental 
ideas created in the United States and Europe, 
so the government decided to invest PAGE18 > 
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SHIGEO KOYASU 
>EMPLOYER 

> EXECUTIVE DIRECTOR 
>RIKEN 


What are the benefits of working ata 
national research institute in Japan as 
opposed to a university? 

RIKEN is an environment in which 
researchers can collaborate across 
disciplines, unlike traditional Japanese 
university departments in which 
researchers have to ask permission just 
to talk to people from other labs. RIKEN 
also maintains excellent facilities. As an 
immunology researcher, for example, 

I need to use expensive cell sorters and 
DNA sequencers, and my centre has 
several of these. 

Foreign researchers account for around 
19% of researchers at RIKEN, higher than 
the national average but lower than the 
target of 30%. Having a heterogeneous 
population is good for science. 


What are the differences from working 
culture in the West? 

Japan does not have a culture of salary 
negotiation and they are much less 
flexible than in the US. People are just 
told what their salary is, based on age and 
experience. Performance schemes have 
been introduced, and if your performance 
is poorer than average, you may take a cut. 


Is it important to speak Japanese? 
Researchers do not need to speak Japanese 
to work in laboratories at RIKEN. We 
translate documents, hold meetings in 
English and assist researchers in banks and 
government offices. Foreign researchers 
can apply for funding in English. Difficulty 
for non-Japanese people becomes apparent 
when getting drivers’ licences and social 
security numbers, for example. m 


Nature Index; Salary ranges: Interviews & job ads; Reported well-being: Gallup World Poll 
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WH E RE 10 Wo RK The below charts represent the research output included in the 2014 Nature Index for ten of Japan's leading institutions, 
and the contributions of different subjects, measured by weighted fractional count (WFC). 
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> more in basic research. But when the financial 
crisis hit, people became concerned about wast- 
ing taxpayers’ money.’ 

By 2000, funding was being directed primar- 
ily at research with clear societal and economic 
returns. Universities now get a greater propor- 
tion of their income from government-com- 
missioned research projects, and they recruit 
a larger number of researchers on temporary 
contracts rather than in more secure, long-term 
arrangements. “A lot of talented young people 
are looking for jobs elsewhere,” says Sunami. 

Still, for researchers such as Duncan McMil- 
lan ona two-year JSPS fellowship at the Univer- 
sity of Tokyo, Japan offers the chance to work 
with pioneers in state-of-the art laboratories. 
McMillan studies the process by which protons 
move between proteins in the membrane-bound 
respiratory chain to produce energy in cells. 

His quest has seen him focus on microbi- 
ology in New Zealand, bioelectrochemistry 
in the United Kingdom and most recently, 
single-molecule microscopy in Germany. 
He was one of 240 foreign fellows selected 
under the JSPS scheme in 2014 — the top 10% 
of applicants. 


“T realized that the technologies in Japan 
were above and beyond what I could access in 
Germany,’ says McMillan. 


‘LABS OPERATE 
AS ONE LARGE 
TEAM WITH STRICT 
HIERARCHIES’ 


Culturally, what sets labs apart in Japan is the 
way they operate as one large team with strict 
hierarchies, says McMillan. “You have to fit 
within that structure while maintaining your 
independence as a scientist — it can require a 
bit of give and take” 

Team activities include dinner parties and 
football, but the real test of a team player in 
McMillan’s laboratory takes place every Mon- 
day morning when undergrads and postdocs 
get together to clean. “Vacuuming office cor- 
ners, mopping floors, removing rubbish and 
putting away glassware; it all happens incredi- 
bly quickly. Sometimes it can be difficult to find 
a job for yourself” = 


The Japan Meteorological Agency monitors 
1,500 seismometers and 4,300 seismic- 
intensity meters, allowing it to report on 
the location and intensity of an earthquake 
within 90 seconds of its occurrence. 


ENTRY REQUIREMENTS 

Foreign nationals can apply for a Highly Skilled 
Foreign Professionals visa, which grants them 
(and a spouse) permission to work and live 

in Japan for five years, multiple re-entry, and 
consideration for permanent residence status 
after just five years (instead of the usual ten 
years). Employers must first issue a Certificate 
of Eligibility after seeing evidence of education 
level, expected salary and age and research 
achievements. Employees must include this 
certificate when they apply for their visa at their 
local Japanese embassy. The process takes 
between one and three months. 

On entering Japan, new arrivals receive a 
residence card that they must carry at all 
times. They can leave and re-enter Japan 

for a maximum period of one year as many 
times as they wish. 


78 % of homes in Japan 
have an electronic toilet 
seat, more than those 
with a dishwasher, but 
fewer than have flat- 
screen televisions. 


[8% 


ACADEMIC YEAR 

Autumn term starts in October and ends in 
late February or March with a winter break 
from late December to early January. Spring 
break in March is followed by spring term 
which runs from April to late July. Summer 
break is in August. 


OPPORTUNITIES & CONTACTS 

The Japan Society for the Promotion of Science 
(JSPS) Postdoctoral Fellowship for Overseas 
Researchers: https://www.jsps.go.jp/english/e- 
fellow/postdoctoral.html 

JSPS Grants-in-Aid for Scientific Research: 
https://www.jsps.go.jp/english/e-grants/ 
Japan Science and Technology Agency (JST) 
Strategic Basic Research Programs: http:// 
www.jst.go.jp/kisoken/en/about/index.html 
JST Impulsing Paradigm Change through 
Disruptive Technologies Program: 
http://www.jst.go.jp/impact/en/ 

RIKEN Special Postdoctoral Researcher 
Program: http://www.riken.jp/en/careers/ 
programs/spdr/ 

RIKEN International Program Associate: 
http://www.riken.jp/en/careers/programs/ipa 
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KOZ MASUMITSU 

>EMPLOYER 

>DEPUTY GENERAL MANAGER 
>LSI MEDIENCE CORPORATION 


What employment opportunities are 

there for foreign researchers in private 
companies in Japan? 

As a clinical-testing company, most of 

our research involves developing cancer 
biomarkers and diagnostic reagents for 
commercial use. We have a few foreign 
nationals working in our research centres, 
but they were hired through the traditional 
recruitment system after graduating froma 
Japanese university. All of them can speak 
and write Japanese fluently. 

Overall, foreigners account for a small 
number of researchers in the private 
sector, but this varies by industry, with 
fewer opportunities in the chemicals and 
semiconductor industries, for example, 
than in the information technology sector. 
The government is discussing incentives 
for foreign companies to set up research 
and development centres based on models 
in Singapore, South Korea and China. 


How does the research environment differ 
between the private and public sectors? 
There is a lot of competition for funding 

in the public sector, which some may say 

is healthy, but it can also bea researcher's 
nightmare. Many young researchers, 
especially in highly innovative fields, are 
hired on fixed-term contracts for only 

two to three years. This makes it difficult 

to retain experienced staff — a major 
headache even for Nobel prizewinning 
stem-cell researcher Shinya Yamanaka. The 
funding environment is much less shaky in 
the private sector, once a company decides 
that a product is important for the business. 


What'’s your advice to foreign researchers? 
When in Rome, do as the Romans do. 
Cultural adjustment to daily life and 

the working environment in Japan can 

be challenging for foreigners. Japanese 
companies value teamwork and it can be 
difficult to figure out how decisions are 
made within the team structure. Sometimes 
patience is needed. The best practice is not 
to hurry — just wait and see. Build good 
relations with your colleagues, go out with 
them for dinner and drinks, and eventually 
you will begin to see results. m 


NEAL PRITCHARD 
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AUSTRALIA 


Remote and beautiful, Australia is stilla magnet for international researchers despite recent 
cuts to the budgets of some large scientific institutions and reduced grant opportunities. 


CSIRO’s ASKAP antennas at the Murchison Radio-astronomy Observatory in Western Australia. 


hat springs to mind when you 
picture Australia? Maybe it’s the 
robust beauty of its coastline or the 


laid-back drawl of the locals. Perhaps you're 
not imagining an internationally competitive 
scientific powerhouse. 

While the nation’s 23.6 million people earn it a 
position of 51 in the world’s population ranking, 
Australia reached number 11 for research-pub- 
lication output in the global SCImago Journal & 
Country Rankin 2013, and last year came fourth 
in the Scientific American Worldview’s biotech- 
nology innovation scorecard. 

Australia is a magnet for international 
researchers,with around 45% of scientists pub- 
lishing research having come there from some- 
where else, according to a survey published 
in Nature Biotechnology in 2012. One such 
researcher is Amanda Moffett,;who moved from 
the US to become a post-doctoral research asso- 
ciate studying galaxy evolution at Perth's Inter- 
national Centre for Radio Astronomy Research. 

“Astronomy is a small field, so everything is 
about how good a fit a place is for your inter- 
est; says Moffett. The pay for postdocs is also 
almost double what she would be receiving had 
she remained in the US, although much of this 
is accounted for by the increased cost of living 


compared with other places, she points out. 

Construction of the Square Kilometre Array, 
a radio telescope project 50 times more sensi- 
tive than any other radio-instrument, is due to 
begin in 2018 in Australia and South Africa. 
Australia is also home to the Australian Synchro- 
tron particle accelerator, and one of the world’s 
highest voltage heavy ion accelerators, as well as 
world-leading facilities in the emerging field of 
metabolomics. 


‘IT’S NICE THAT IT’S A BIT SMALL 
SO YOU GET TO KNOW A LOT OF 
OTHERS IN THE SAME FIELD 


For Moffett, Australia’s relatively small 
population makes apparent the lack of women 
in science. “In the States the ratio [of men to 
women] is not that different, but the absolute 
size of the community here is much smaller so 
you feel like there are fewer higher-level women 
to look up to and have as mentors,’ Moffett says. 

Sally Lavender, who moved to Australia from 
the UK with her partner after completing her 
PhD and is now a researcher at CSIRO Oceans 
and Atmosphere in Melbourne, has PAGE 22 > 
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FRANK GANNON 

>EMPLOYER 

> DIRECTOR AND CEO 

>QIMR BERGHOFER MEDICAL 
RESEARCH INSTITUTE 


What’s good about working in Australia? 
There's a strong commitment to doing 
quality research and moving it through the 
system to have a clinical impact. It also has 
a great quality of life. 


What are the challenges? 

The world needs to know what youre 
doing, so researchers from here travel a 

lot to international scientific meetings. 
That is more necessary than if you were 

in mainland Europe and you could drop 

in on nearby labs for interest and profile 
raising. Funding is a challenge, but tell mea 
country where it isn’t. 


What are the main concerns for 
international job-seekers? 

Firstly, the distance — but because of Skype 
it is easy to remain in contact. The second 
question that comes up is, why come here 
as opposed to elsewhere? That really is 
because of the quality of work. Then it gets 
down to visas. The 457 visa, you get that 
very readily, it’s four years and you and 
your partner can work for that period. It 
can be renewed, and you can convert it into 
a longer term here or permanent. 


How can jobseekers improve chances? 
Funding is one thing people should be 
looking for. If you get that, you've got your 
own salary, which is always very welcome, 
but you also then are able to say that 

you had that as an award. Organizations 
such as the European Molecular Biology 
Organization offer fellowships to anybody 
from Europe to go anywhere in the world, 
and that includes Australia. m 


Nature Index; Salary ranges: Interviews & Job Ads; Reported well-being: Gallup World Poll 
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The below charts represent the research output included in the 2014 Nature Index for ten of Australia's leading institutions, 
and the contributions of different subjects, measured by weighted fractional count (WFC). 


KEY 


& Life sciences 


e) Earth & 
environmental 
sciences 


@ Chemistry 


@ Physics 
pea Major industry 
employer 


*Each slice represents 
the proportion that each 
subject area contributes 
to an institution's overall 
WFC. Subject areas can 
overlap, so the total 
percentage may 

exceed 100%. 


SALARIES 


160 Professor 
140 
120 é [] 


100 


Thousands US$ 
oo 
Oo 


60 
a Lecturer or assistant professor 


Starting or postdoc salary 


Australian researchers enjoy the highest salaries in the six countries, according to 
reported data, but the country is also among the most expensive places to live. 
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REPORTED WELL-BEING 


Australians report the highest levels of satisfaction with 
life among residents of the six countries. 


FUNDING OVER TIME 
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Some maior scientific institutions in Australia have faced budget cuts as 
research and development (R&D) funding as a proportion of gross domestic 
product (GDP) has dipped from a high point of 2.2% in 2008. 
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Australia invests the second-highest 
amount per researcher, after Japan. 
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output included in the Nature Index is a 
little over average for the six countries. 
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> come to appreciate the country’s scientific 
microcosm. “It’s nice that it’s a bit small, so you 
get to knowa lot of other scientists that are in the 
same field as you, [and see] the same people at 
domestic conferences,’ she says. Having recently 
had her first child, Lavender says her experience 
of maternity leave and the public health system 
was “brilliant”. 

Less positive, Lavender says, is that funding 
constraints are tightening, and not just for cli- 
mate-related research. Indeed, there have been 
major ups and downs in the funding of science 
in recent years, with cuts to some of Austral- 
ia’s biggest scientific institutions including the 
CSIRO and the Cooperative Research Centres 
— research consortia of industry, university 
and government organizations designed to 
boost economic growth. 

The axe has also fallen on the Australian 
Research Council’s Future Fellowships, which 
used to support 200 mid-career researchers 
including overseas applicants for four years. That 
has recently been cut to just 50 fellowships, with 
preference given to Australian applicants. 


On the flip side, the Federal Government has 
this year committed to funding an AUD$20 
billion Medical Research Future Fund, and 
thrown a last-minute financial lifeline to the 
Australian Synchrotron. 

“The really high-quality science is still get- 
ting funded,” says James Murphy, who moved 
to Australia from New Zealand to undertake a 
PhD in biological chemistry, and is nowa group 
leader at the Walter and Eliza Hall Institute of 
Medical Research in Melbourne. He has been 
involved in assessing grant applications for the 
National Health and Medical Research Coun- 
cil, and says that although there has been a 
recent decline in the number of grants offered, 
“there are still initiatives and a diversity of funds 
to which you can apply”. 

Murphy recommends sourcing funds inter- 
nationally before coming to Australia. “It’s a 
fine training ground; facilities-wise it’s world 
class,” he says, “but, bring as much money 
as you can from a foreign body and use that 
to fund your work and your salary while 
youre here.” m 


Australian-born physicist Sir William 
Lawrence Bragg and his father Sir William 
Henry Bragg who jointly won a Nobel 

“~ Prize in 1915 are the only father-son team 
to have ever been awarded the honour. 


The document most commonly used by 
researchers coming to Australia is the 457 
visa. This is a temporary visa enabling skilled 
workers to come with their dependents for 
employment in their chosen field for up to 
four years. They must be either sponsored 
by an employer or be a good fit for an 
available position. These visas are generally 
processed within 90 days of submission 
and currently cost AUD$1035 (US$800). 
Another option is the non-sponsored 

189 skilled independent visa, which 

costs $3520. English-speaking qualified 
applicants under the age of 50, who could 
help fill the country’s skill shortages, are 
assessed under a points-based system. 


Australia’s mainland coastline 
is 35,000 km long, with more 
than 10,000 beaches. 


Most Australian universities have two terms, the 
first of which begins in late February or early 
March and runs until May or June with a break 
in April. The mid-year recess is late June until 
late July. The second term runs from then until 
late November, with a break in late September 
or early October. 


OPPORTUNITIES & CONTACTS 

The Australian Research Council’s Discovery 
Projects — Discovery International Awards: 
http://www.arc.gov.au/general/international_ 
researchers.htm 

The Australian Research Council’s Australian 
Laureate Fellowships: http://www.arc.gov.au/ 
ncgp/laureate/laureate_default.htm 

The Group of Eight (G08) European Fellowship 
scheme: https://go8.edu.au/article/go8- 
media-release-european-fellows-2015 

The Endeavour Scholarships and Fellowships: 
https://internationaleducation.gov.au/ 
endeavour%20program/scholarships-and- 
fellowships/international-applicants/pages/ 
international-applicants.aspx 

The Connecting Australian European Science & 
Innovation Excellence programme: 
http://www.caesie.org/ 
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JOHN HEASMAN 

>EMPLOYER 

> PRINCIPAL RESEARCH ENGINEER 
> COCHLEAR LTD 


What does Australia have going for it asa 
place to live and work? 

Fora small nation, real benefits come 
from focusing on areas where you have 
strengths. Australia is leading the world in 
some areas of research innovation, such 

as applied clinical care, cancer treatment, 
and vaccines. The climate and the lifestyle 
are huge draw, particularly in Sydney and 
Melbourne. It’s also a very multicultural 
society now, which has really facilitated 
people moving from Europe, the US and 
other countries. Cochlear has people from 
71 nationalities in its Sydney headquarters. 


Are there any downsides? 

The geographical separation is a challenge. 
Though teleconferencing and the Internet 
have revolutionized communication, it’s 
not the same as working face to face. 


What are the main questions international 
jobseekers have? 

Generally, overseas applicants first ask 
about who to approach for opportunities. 
Our advice is to look for Centres of 
Excellence with Australian Research 
Council funding. Generally they're groups 
that have good academic and industrial 
ties, and collaborate very closely. The 
main aim of those relationships is to work 
on cutting-edge research and make sure 

it is applied and relevant to commercial 
industry. We're big believers in the power 
of basic research, but the real power is 
translating that into clinical practice. If 
youre interested in working in private 
industry, youre best to approach some of 
the bigger players, such as Cochlear and 
ResMed which have strong research and 
development programmes. 


What do foreign job seekers bring to 
Australia? 

Through dealing with their supervisors 
and the running of their research, they 
establish ties with other collaborators, so 
often bring connections with them. The 
other advantage is knowledge transfer. 
We're isolated geographically and it's 
important to not just rely on publications, 
but have human interactions. = 
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SINGAPORE 


A hub for proactive and dynamic professionals, funding and salary levels in this ambitious 
city-state make the opportunities on offer hard to resist for many. 


HUFTON AND CROW 
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Nanyang Technological University’s new Learning Hub, which features state-of-the-art classrooms. 


ment,’ says Sara Sandin. A Swedish assis- 

tant professor in the school of biological 
sciences at Nanyang Technological University 
who came to Singapore in 2012 to set up an 
electron microscopy laboratory, Sandin speaks 
highly of her adopted home. “Research budget 
cuts elsewhere make it hard to maintain a pos- 
itive and creative working environment. You 
don't have that in Singapore?’ 

Between 2011 and 2015, 16.1 billion 
Singapore dollars(US $12billion) was invested 
in science and research by the island city-state, 
a 20% increase on the previous five-year period. 
“The long-term goal is to turn Singapore into 
a world-renowned research hub,’ says Terence 
Ong, of Contact Singapore, a government 
agency that seeks to attract people to live and 
work on the island. “Singapore is committed to 
keeping up momentum for money going into 
research” Ina world of static or shrinking budg- 
ets, this makes Singapore hard to resist for many. 

Science is increasingly carried out in collabo- 
rations between academia and industry. In 2014, 
for example, global IT giant Fujitsu set up the 
Centre for Excellence on Sustainable Urban- 
isation in collaboration with the Singapore 
Management University and the government's 


cc E such an international, positive environ- 


ENTRY REQUIREMENTS 

Once a foreign scientist has accepted a job 
offer, they need an Employment Pass from 
the Ministry of Manpower. Their employer 
must do the paperwork once it has received 
proof of education documents and CV 


Agency for Science, Technology and Research 
(A*STAR), which plays a leading part in encour- 
aging these combined efforts. “We believe that 
this collaboration will bring a lot of value, some 
that we cannot even predict;’ says Rio Yamaura, 
vice president of the New Solutions Business 
Division at Fujitsu. 

Although Singapore entered a period of 
deflation at the end of last year, high inflation 
between 2010 and 2014 made it one of the 
world’s most expensive cities to live in. A one- 
bed apartment in the city centre costs anywhere 
between SGD2500 to more than SGD8000 per 
month. But high salaries and low income tax 
rates (maximum 20%) allow for a comfortable 
standard of living. 

“We advise incoming scientists to use our cost 
-of-living calculator before arriving so they can 
prepare themselves,’ says Ong. There is assis- 
tance through housing allowances and some 
on-campus housing offered by universities to 
researchers for, on average, one-third of the 
market rate. m 


Despite having no oil, 
Singapore is the world’s 
largest manufacturer of 


offshore oil-rig platforms. 


details, and the employee presents it to the 
ministry. Those wanting to explore work 
opportunities in Singapore can apply for 

a Personalised Employment Pass, which 
allows professionals to remain in Singapore 
for six months. 
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BERTIL ANDERSSON 

>EMPLOYER 

>PRESIDENT 

>NANYANG TECHNOLOGICAL UNIVERSITY 


How has Singapore changed since 

you arrived? 

When I came here, I joked that Europeans 
talk too much, but Singaporeans act. And 
the joke became my reality. There was 
good money and quick decisions, so things 
grew rapidly. The investment in academia 
is monumental if you look at the size of 
the country, and that has all happened in 
the last 10-15 years. I’ve recruited a lot of 
professors in science and engineering, and 
very few leave Singapore. People love it, 
professionally and personally. 


What are the challenges for foreign 
scientists? 

I call it Asia-lite: a mix of East and West, 
so Europeans and Asians feel at home. 

It's very humid, so you have to live in an 
air-conditioned box. From a professional 
standpoint, this part of Asia has alot of 
corruption. The government has fought it 
with heavy bureaucracy. You may need 

11 signatures to buy one lab chemical. 


Are there any cultural differences 
foreigners should be aware of? 

People don't always say what they think. 
Singaporeans don’t want to lose face. If1 
tell my professors that they did something 
wrong, they take it very badly, so you have 
to be diplomatic. It's a secular society and 
religion is private. Singaporeans appreciate 
harmony and the government plays a 
watchful role in maintaining this. 


How can foreigners prepare? 

Have some money saved and a credit card! 
It's not cheap to live here. You need to bea 
proficient English speaker.m 
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WH E RE 10 W0 RK The below charts represent the research output included in the 2014 Nature Index for ten of Singapore's leading institutions, 
and the contributions of different subjects, measured by weighted fractional count (WFC). 
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SOUTH KOREA 


Focused on recruiting overseas researchers and encouraging basic science, 
this Asian tech hubis using research to drive development 


hen it comes to science, South Korea 
isan ambitious player. It invests more 
of its GDP in research and develop- 


ment than any other developed nation except 
Israel. This has helped propel the country’s rapid 
rise from war-torn nation to G20 summit host 
and home of powerhouses like Samsung and LG. 

But only within the past decade have uni- 
versities begun to employ more scientists from 
abroad, boosted by a government programme 
established in 2009 that committed around 
$700 million to help universities bring talent 
from abroad. This recruitment drive is not with- 
out complication, not least because of cultural 
and language barriers. Foreigners often feel dis- 
advantaged seeking funding as many agencies 
require applications to be submitted in Korean. 

The attempt to attract more foreign recruits 
coincides with efforts to boost basic research, 
after decades of focusing on applied research 
and economic growth. President Park Geun- 
hye has pledged to ease the path for tech start- 
ups, limit the influence of the tech giants and to 
kickstart a “creative economy”. Halfway through 
her five-year term, the results have been mixed. 
The Institute for Basic Science, a research centre 
founded in Daejeon in 2011 aimed at rivalling 


Thanks to South Korea’s push to make its 
scientific workforce more international, 
the government freely approves work 
visas for researchers, though anecdotal 
reports suggest there can be hold-ups 

at the university or department level 

if institutions are not accustomed to 
employing foreigners and are unfamiliar 
with the required documentation. These 


Among the fruits of R&D investment are companies like LG Electronics, whose headquarters are in Seoul. 


Japan's RIKEN, has since opened other centres, 
but scientists say that grant pools are spread thin, 
and levels of individual grants are flat or falling. 

On the positive side, junior researchers in 
South Korea enjoy relatively great freedom and 
are not expected or obliged to work on projects 
run by senior academics. 

Foreigners are often surprised by the array of 
equipment at their disposal. “For biology, there 
is everything we need in Korea,’ says Eric di 
Luccio, a structural biologist from France who 
joined Kyungpook National University in Daegu 
in 2010. That includes national facilities such as 
the Pohang Accelerator Laboratory that are not 
yet on the radar of researchers who might trek to 
Japan or Taiwan for instrument time. “There is a 
lot of infrastructure and research going on, and 
it's not widely recognized yet,’ says di Luccio. = 


South Korea’s Jellyfish 

| Elimination Robotic 

Swarm (JEROS) provides 

an automated solution to 
problematic jellyfish blooms. 
Robots hunt in packs, each 
capable of pulverizing 400 
kg of jellyfish every hour. 


visas need to be renewed annually. Foreign 
nationals whose parents or grandparents 
emigrated from South Korea — or who 
were once citizens themselves, including 
those adopted abroad — are eligible for an 
Overseas Korean F-4 residence visa. This 
grants nearly all the benefits of citizenship 
except voting rights, and is not specific to a 
particular employer. 
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HAKIM DJABALLAH 
>EMPLOYER 

>CHIEF EXECUTIVE OFFICER 
>INSTITUT PASTEUR KOREA 


Are there good opportunities for 
postdocs in Korea? 

Coming here for three years or so is a great 
opportunity for a postdoc considering it as 
alittle bit ofa break to do some teaching, 
enjoy life in Korea and explore Asia. You 
may struggle in terms of publications 
unless you're at Seoul National University. 
Korea has been struggling with publishing 
in high-profile journals because editors 
and reviewers get more sceptical with the 
things that they send because of past fraud. 
I think many editors have had their fingers 
burned, and they often consider our 
researchers guilty until proved innocent. 


What are the main obstacles for foreign 
scientists in Korea? 

The lack of transparency and fairness of 
the granting and peer review systems. 
Foreign researchers in Korea suffer a lot 
because of the old boy network. It's often a 
case of, “We went to kindergarten together, 
so [ll fund you: You would never expect 
foreign scholars to be required to write 
grant applications and to have to defend 
their proposal in Korean, yet often that is 
expected. My researchers here suffer from 
that. We feel discriminated against. 


What advice would you give to scientists 
coming to Korea? 

Negotiate everything in writing before 
you get here. The Koreans have a 
tendency to say, ‘It’s OK, we'll take care of 
it once you get here’ Scientists need to get 
agreements on everything from salary to 
health insurance, all the way to if they are 
married and want to bring their family, 
how they're going do it. m 
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WHERE TO WORK 


The below charts represent the research output included in the 2014 Nature Index for ten of South Korea’s leading institutions, 
and the contributions of different subjects, measured by weighted fractional count (WFC). 
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Life satisfaction among South Koreans is just below 


the same grades elsewhere in the region, according to our ranges based on 
reported data. 


FUNDING OVER TIME 


R&D expenditure as % of GDP 


0 
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South Korea has roughly doubled its proportion of gross domestic product 
(GDP) spent on research and development (R&D) in the past decade. 


average for the six countries. 


— SPENDING PER RESEARCHER — 


io} 50 100 150 200 250 
R&D funding per researcher (thousands US$)* 


Despite investing the highest proportion of 
its GDP in R&D, South Korea spends only 


the third most in the region per researcher. 


*Figures are normalized for purchasing power, 
and are the latest available for each country. 


——— RESEARCH QUALITY ——— 
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Only China scores lower than South Korea 
in the proportion of scientific papers 
published in Nature Index journals. 
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NEW ZEALAND 


Anatural laboratory like no other, the small country is renowned for its Earth science and 
agricultural research. It has a multi-cultural environment and is proud of its collaborativeness. 


~~ 22 


steep, Aoraki/Mount Cook is a magnet for 
elite mountaineers. New Zealand's high- 
est peak also symbolizes the spectacular natural 
environment that draws many to study and work 
in this South Pacific island nation. 
The geologically unstable juncture of two 
tectonic plates, which has pushed Aoraki to 
3,724 metres, regularly shakes the country and 
underpins much of the science at which New 
Zealand excels. It was the vantage point for stud- 
ying geophysics and Earth sciences that attracted 
Pegah Faegh-Lashgary from Iran to do her PhD 
at the Victoria University of Wellington in 2012. 
“The university staff were thoughtful and things 
like accommodation were all sorted. The only 
negative has been the distance from Europe and 
my home country,’ says Faegh-Lashgary. 
Faegh-Lashgary would consider staying on 
after her PhD, but expresses a widespread con- 
cern: the prospects for postdocs and for pursuing 
an academic career are limited. “New Zealand 
has very few postdocs,” says Siouxsie Wiles, a 
University of Auckland microbiologist who 
relocated from London six years ago. 
Yet it may be that New Zealand's small popu- 
lation has helped foster its ingenuity, known as 


Pp erennially snow-capped and tantalizingly 


ENTRY REQUIREMENTS 

New Zealand has been proactive in developing 
its economy since the global financial crisis, 
but has many specialist skills shortages. It is 
recruiting overseas candidates to fill positions 
in fields such as medicine, engineering, 
information and communications technology 
and agriculture and forestry. Foreign applicants 
can apply for jobs before they have a visa, and 


Aoraki/Mount Cook is one of the many wonders attracting researchers to New Zealand. 


the ‘number 8 wire tradition — a reference toa 
gauge of fencing wire that has been been adapted 
for uses beyond its original purpose. It is an 
attribute that has helped punch above its weight 
in the Nobel laureate-stakes. New Zealand has 
three; Ernest Rutherford, for chemistry (1908), 
Maurice Wilkins, physiology or medicine (1962) 
and Alan MacDiarmid, chemistry (2000). 
About one quarter of New Zealand’s R&D 
spending is accounted for by seven govern- 
ment-owned companies called Crown Research 
Institutes (CRIs). The two largest are focused 
on plant and food research, and agricultural 
research. Fonterra, a giant dairy-farming coop- 
erative is responsible for almost one third of 
global dairy exports. With some 400 employ- 
ees, itis the country’s largest private organization 
doing research. New Zealand spends 1.14% of 
its GDP on R&D — much less than the OECD 
average of 2.4%. There is widespread support for 
government efforts to lift that to 2%. m 


New Zealand-born Ernest 
Rutherford, recipient of the 
1908 Nobel Prize in Chemistry, 
discovered the proton and was 


the first person to split an atom. 


employers usually assist in obtaining one. For 
those with scientific qualifications, this will be 
the Skilled Migrant Visa, for which people aged 
20-55, who are proficient in English and who 
meet certain health requirements are eligible. 
To work in some areas, notably engineering and 
medical and health professions, there is also 

a legal requirement to be registered with the 
appropriate professional body. 
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MICHAEL MCWILLIAMS 
>EMPLOYER 

>CHIEF EXECUTIVE 
>GNS SCIENCE 


What is particularly attractive about 
working as a scientist in New Zealand? 
From the Earth-sciences perspective, it's 

a fantastic natural laboratory. We have all 
manner of active features that represent 
both significant geological hazards to the 
country and great research opportunities. 
New Zealand is a truly amazing geological 
sandpit, and scientists want to come. It’s a 
great place for collaboration. 


Is the geographical isolation a problem for 
career development? 

None of us is isolated; we have the tools to 
collaborate as long as we're flexible about 
time zones. Europe and North America are 
less than a day away, Asia is closer, and at 
any time 15% of my scientists are overseas. 


What cultural aspects should those 
coming to work from abroad be aware of? 
New Zealand is a multicultural nation like 
many others, so a cosmopolitan approach 
isa benefit. A desire to learn about and 
respect Maori culture is important. 


GNS Science is one of the seven Crown 
Research Institutes (CRIs), government- 
owned companies that do one quarter 

of R&D in New Zealand. How does this 
unusual structure affect the work they do? 
The CRI business model, in which 

research is funded by government and 
private sources and administered on 
corporate principles, is unusual, but not 
unique. Australia’s CSIRO, the Fraunhofer 
Institutes in Germany, and the Dutch TNO 
are all similar. The principal advantage 

is the ability to make clear, visible and 
valuable contributions of national benefit. = 
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WH E RE 10 W0 RK The below charts represent the research output included in the 2014 Nature Index for eight of New Zealand’s leading institutions, 


and the contributions of different subjects, measured by weighted fractional count (WFC). 
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Although others in the Asia-Pacific region have seen large rises in research 
funding, the proportion of GDP spent on R&D has remained consistently 
low in New Zealand. 
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— SPENDING PER RESEARCHER — 


io) 50 100 150 200 250 
R&D funding per researcher (thousands US$)* 


New Zealand spends the least per 
researcher and less than half of that 
spent by regional leader Japan. 


*Figures are normalized for purchasing power, 
and are the latest available for each country. 


——— RESEARCH QUALITY ——— 
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Measured by the proportion of its scientific 
papers included in Nature Index journals, 
New Zealand’s research quality is below 
the average the six countries. 
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